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1 Introduction 



There axe two widely held and mutually inconsistent conceptions of abihty and scholastic achievement 
tests. The first view claims that cognitive ability is essentially fixed at a relatively early age (aroimd 
age seven) and is virtually rmchanged afterward. According to this view, achievement tests and IQ tests 
measure the same fundamental cognitive skiU. The correlation between IQ and achievement tests is high 
and proponents of tfiis view use these two t 3 T)es of tests interchangeably. Accor ding to scholars who 
advocate this point of view, schooling and other influences barely budge measured IQ. (See the evidence 
summarized in Herrnstein and Mmray, 1994, Appendix 2.) A consensus estimate in this hterature is that 
a year of schoohng raises measured IQ by about one point (Jencks, 1972; Herrnstein and Mmray, 1994).^ 

A second widely held view claims that schooling raises achievement measured by tests and more 
successful types of schooling raise measured achievement more. This is the premise of large scale testing 
programs designed to monitor the performance of schools. Debates about the effectiveness of vouchers and 
interventions hinge on their effects on measured achievement (see, e.g., Hanushek, 2002). This hteratme 
imphcitly separates out latent abihty (IQ) from measured abihty and views schoohng as a mechanism for 
either enhancing or reveahng abihty. Proponents of this view argue that schoohng can increase measmed 
abihty by as much as 2 to 4 points (Winship and Korenman, 1997; Neal and Johnson, 1996), or 2.9 to 5.7 
AFQT points (2.7 to 5.4 percentage points). 

This paper presents evidence that the measure of IQ used by Herrnstein and Mirrray is strongly affected 
by schoohng. Postulating that latent abihty cannot be affected by schoohng, we test whether manifest 
abihty is affected by schoohng when both schoohng and manifest abihty are affected by latent abihty. 
Manifest abihty is widely regarded as a determinant of socioeconomic success. Gaps in test scores across 
socioeconomic groups are widely viewed as a major source of social problems (Jencks and PhiUips, 1998; 
Herrnstein and Murray, 1994). We e xamin e whether measured abihty gaps can be ehminated by schoohng. 

Our measures of abihty are the AS VAB achievement (competency) tests used to screen persons entering 
the mili tary. ASVAB stands for Armed Services Vocational Aptitude Battery and is described in more 
detail below. We find that schoohng, especially in the high school years, is an important determinant of 
measured achievement. It operates differently at different latent abihty levels. 

In order to estabhsh these conclusions, we need to address the problem of reverse causahty. There 
is a weU-estabhshed empirical regularity that measured test scores predict schoohng. Individuals choose 
to attend school in part based on their own inteUigence which is measured by test scores. In addition, 
admission into coUeges and feUowship support is based, in part, on scores on tests hke the Scholastic 
Assessment Test (SAT). The central econometric question addressed in this paper is how to characterize 
and solve the problem of joint causahty: schoohng causing test scores and test scores causing schoohng. 

HQ is assumed to have mean 100 and standard deviation 15. Many of the papers in the literature obteiin estimates 
using the Armed Forces Qualification Test (AFQT), the test used in this paper, which has a scede of 0-105. Typically 
estimates are converted into “IQ points” by computing the effect of education in terms of standardized AFQT score and 
then, assuming that a standard deviation increase in AFQT is equivalent to a standard deviation increase in IQ score, by 
further multiplying by 15. Herrnstein and Murray (1994) estimate an increase of 1.1 IQ points per year of education, or 
1.6 AFQT points (1.5 percentage points), using our estimate of the standard deviation of AFQT, 21.6. 





Our solution is based on a model of test scores as manifestations of latent ability (and other determinants) 
with schoohng determined by latent abihty (and other determinants). Our framework accoimts for ceiling 
effects (on some easy tests, students with very different abihty levels get perfect scores) and endogeneity 
of schooling (which includes choice of date of entry into schooling as weU as choice of final schooling level). 

We find that the effects of schooling on test scores for a given level of abihty are approximately hnear 
across schooling levels. Effects are shghtly larger for those with lower abihty. Schoohng increases the 
AEQT score on average between 2 and 4 percentage points. This is roughly twice as large as the effect 
claimed by Hermstein and Murray (1994). 

The plan of this paper is as foUows. Section 2 presents our firamework and discusses some important 
conceptual issues. Section 3 apphes the method of control functions, developed by Heckman (1976, 1980) 
and Heckman and Robb (1985, 1986), to identify schoohng effects on tests using a special feature of the 
National Longitudinal Survey of Youth (NLSY) data. A nonparametric version of the method is developed, 
but it suffers from certain practical limitations in more general cases. Section 4 presents a parametric 
econometric model motivated by choice theory for the joint determination of schooling and test scores. 
This method aUows us to supplement the nonparametric control function method to impose additional 
identifying information to develop a method for determining the effects of schoohng on test scores in more 
general data sets than the NLSY and to accovmt for ceding effects on tests. Section 5 presents empirical 
results. Section 6 concludes. Appendix A describes the data. The estimation algorithm for the control 
function approach is presented in Appendix B. The hkehhood and the Bayesian computational methods 
used to estimate it are presented in Appendix C. 

2 The Relationship Between Ability and Schooling 

Let T(s) be the test score of a person with s years of schoohng at the time the test is taken. For notational 
simphcity, we keep imphcit the conditioning on all other variables that determine T(s) except latent abihty 
/. The other variables might include age, socioeconomic status of the parents, and other environmental 
and genetic factors. We accovmt for some of these additional variables in our empirical work. 

Our model of test scores is based on an extension of the factor analysis model used in psychometrics 
(see e.g.. Lord and Novick, 1968). Test score T(s) is a manifestation of latent abihty / mediated by 
schoohng: 

T(s) = n{s) + X(s)f + s{s) (1) 

where it is assumed that e{s) is independent of /. Both / and s{s) are assumed to have zero means. This 
amovmts to a normalization and a definition of the mean, We extend the standard model of factor 

analysis by allowing the level of s selected to depend on /. For externally-manipulated levels of schoohng, 
ix{s) in equation (1) is the effect of schoohng that is uniform across latent abihty levels and A(s) is the 
effect of schoohng on revealing or transforming latent ability /. The marginal causal effects of changing 
schoohng from s' to s on levels and slopes are //(s) — //(s') and A(s) — A(s') respectively using the usual 
ceteris paribus logic farnffiar to all economists.^ 



^Throughout this paper we maintain the traditional separable-in-the-errors model of equation (1). A more general 



The psychometric and educational testing hteratures are fundamentally ambiguous about what con- 
stitutes cognitive ability. Is it /, T(s), yu(s) or yu(s) -I- A(s)/? Neal and Johnson (1996), Winship and 
Korenman (1997), Winship (2001), and Herrnstein and Mmray (1994) talce measmed test scores (T(s)) 
to be cognitive ability. Yet the logic of IQ testing interprets / as cognitive ability. A reinterpretation of 
equation (1) writes A(s)/ as abihty determined at schoohng level s. Knowing only T{s) and S = s, we 
cannot decide which of these interpretations is correct. Without further information, the model is funda- 
mentally imderidentified because we do not observe /. We can identify the combination of parameters 
in 

E{T{s)\S = s)= ix{s} + X{s)E{f\S = s) + E{e{s)\S = s) (2) 

but the causal status of an estimated effect of 5 is imclear because both E {f \ S = s) and E (e(s) | 5 = s) 
may depend on S. Thus latent cognitive abihty (/) may determine S and so may measured abihty (T(s) 
and hence e(s) given /). If the test studied does not directly affect schoohng decisions, e.g. through its 
use in admission criteria, as is the case for the test analyzed in this paper, then £J(e(s)|5 = s) = 0.^ 

The empirical hterature recognizes the problem of reverse causality, and adopts different strategies 
for identifying different parameters. Herrnstein and Mmray (1994) and Winship and Korenman (1997) 
imphcitly adopt yu(s) as their parameter of interest, assume that yu(s) = (linearity) and use an “early” 
test score (obtained at an earher age) to proxy f} Let proxy P be 

P = lo + lJ + 4P) ( 3 ) 



where / and e{P) are independent, and 7o,7i are assumed not to be functions of the S in (1). Solving for 
/ and substituting into (1) we obtain 

T{s) = - A(s)^ + ^P + 

7i 7i 

Observe that the composite error is correlated with P unless P is a perfect proxy for / (e(P) = 0) , 
the imphcit assumption used by Herrnstein and Murray (1994).® Herrnstein and Murray also imphcitly 
assume that A(s) does not depend on s. Then, using ordinary least squares apphed to (4), they estimate 
the marginal effect of schoohng which in their setup is ^ (fJ-is) = s/3). If A(s) = A, but e(P) ^ 0, and if 
Aq^ > 0 (so that / affects P and T{s) in the same way), then least squares based on (3) is upward biased. 
More generally, if A depends on s, the bias is ambiguous and depends on specific parameter configurations. 
The combination of parameters yu(s) — A(s)^ becomes the imphcit parameter estimated and it does not 
answer the questions posed in the hterature. 

Winship and Korenman (1997) consider the problem of measurement error in their proxy P. They draw 
on work by Ashenfelter and Krueger (1994) who claim that the rehabUity (the proportion of variance of 
P that is true, qfcr^, relative to the total variance -1- cr^) of IQ measures is typically above 0.9. 

nonseparable model would be desirable but is beyond the scope of this paper. 

^Even though the tested know their scores they are not directly used by schools or firms to screen persons and we assume 
that they do not affect subsequent actions. 

^ These authors also include additional control variables which we do not discuss. 

^ Fakes and Olley (1995) make the same assumption in a different context. 



e(s) - 



A(5)£(P) 

7i 



( 4 ) 
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Wiiiship and Korenman carry out a variety of sensitivity analyses, estimating the model under different 
assumptions about the rehabihty of the early IQ score, which they let take on values between 0.8 and 1. 
They obtain a wide range of estimates of the effect of schooling on AFQT, from 1.5 to 5 points. Correcting 
for measmement error under what Winship and Korenman beheve to be “reasonable assumptions about 
the extent of measmement error,” they estimate the effect of education to be 2.7 IQ points per year of 
school and they state that “a year of education most likely increases IQ by somewhere between 2 and 4 
points.” 

Neal and Johnson take a different approach. They choose ^ in the specification /i(s) = as their 
parameter of interest and use month of birth (which determines years of schooling attained by a given 
birth cohort) as an instrument to avoid dependence of S on /.** This forced variation in schooling attained 
among children of the same nominal birth cohort is a somrce of identifying variation. We estimate a richer 
set of parameters and consider how schoohng affects test scores at different levels of the latent ability 
distribution. However, om: estimates of the same parameter are in agreement with theirs. 

A variety of other studies, smweyed by Ceci (1991), rely on various “natmral experiments” of uncertain 
quality. Winship and Korenman (1997) smwey and criticize this hteratme. 

In this paper, we estimate /i(s) and A(s) for different levels of schooling without imposing the para- 
metric restrictions used in the previous hteratmre. We exphcitly account for the endogeneity of completed 
schoohng. In addition we estimate the distribution of latent abihty (/) and compare it with measmed 
abihty. We can identify the effect of schooling on measmred test scores at different latent abihty (/) levels. 
This allows us to identify where in the overall distribution of abihty schoohng interventions are the most 
effective. We first develop estimators based on the principle of control functions. 

3 Simple Identification Strategies Based on Control Functions 

Our first approach to this problem exploits an unusual featvue of the NLSY data. The test we study is 
given to a nationally representative sample of people. Some people who take the test are in school while 
others have finished school. We observe completed schoohng for all individuals. Let St denote schoohng 
that a person has at the date of the test. We observe the test score T{St) which can be expressed as 



T{St) = KSt) + A(5t)/ + (5) 

Letting S denote the final level of schoohng that is actually attained, S > St- Let A be the age at which 
a person is tested. If we rede fin e age so that schoohng starts at age 0 and if we assume that dropouts do 
not return to school, then we observe St = A < S if the test date comes before a person has completed 
his schooling.^ If he has completed schoohng by the time of the test, then we observe St = S. 

Using the control function approach introduced in Heckman (1976, 1980) and Heckman and Robb 
(1985, 1986), and assuming no matvuation effects (no independent effect of age on performance on the 

®In most school districts, in a given year any 5 year-old child whose birthday falls after October 1 must wait to start 
school in the following year. 

Cameron and Heckman (2001) present evidence that few high school dropouts return to school. 
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test) and that everyone starts school at the same age, we may write observed tests conditional on final 
schooling and schooling at the test date as 

E{T{St)\St = st,S=s) = fi{sr) + X{sT)E{f\ST = ST,S = s) (6) 

+E{e{ST)\ST = ST,S = s). 



To simphfy the notation we keep other conditioning variables imphcit. 

Because sampling is random across ages, if individuals consider only their final schooling level when 
m aking schooling decisions, irrespective of their path to schooling and there is no dropping out and re- 
entry, conditional on S' = s the observed St is random with respect to /. Thus E{f \ St = st, S = s) = 
E{f I S = s). Further, if the test is not used to make decisions about schooling, E{e{St) \ St = st, 
S = s) = 0. 

Under these assumptions we obtain 

E{T{St) \St = st,S = s)= /x(st) + \{sT)E{f \ S = s), (7) 

Prom this equation it is clear that we cannot identify the scale of / without some normalization. Setting 
A (1) = 1 is one such normalization. We can identify A(st) up to the normalization because for two 
different schooling levels s,s' > st,s ^ s' , 

E{T{St) \ St = St, S = s)~ E{T{St) \ St = st,S = s') = A(st)[^(/ \ S = s) - E{f \ S = s')]. 



Assuming A(st) ^ 0, we may form the ratio 



E{T{St) 


1 St = s't, S = s)~ E{T{St) 


\ St = s't, S= s') 


E{T{St) 


\St = st,S=s)~ E{T{St) 


1 Sx = 57* 5 S = 5^) 



A(st) 



for two values st ^ s't, both less than or equal to s, s'. Therefore with one normalization we can identify 
all of the A(st), st = 1, ..., 5 — 1 (we cannot identify A (5) because there is only one possible value of s 
for St = S). 

Taking expectations with respect to St alone we obtain 



E (T (5t) I 5t = St) = /^ (st) + A(st)F;[/|S't = st].® 



( 8 ) 



Recall that we know A(st), st = 1, • • • , 5 — 1, from the preceding argument. Subtracting (8) from (7) we 
obtain 



E{T{St) \ St = St, S = s)~ E{T{St) \ St = sfy = X{sT)[E{f \S = s)-- E{f \ St = st)] 

so we can identify for s > st, st = 1, • • • , — 1, 

E(T{St) \St = st,S = s)- E{T{St) \ St = st) 



E{f\S = s)- E{f I St = St) = 



\{st) 



( 9 ) 



Let E{f I 5 = s) = Qs and E{f \ St = st ) = We can form a matrix of the following identifiable 
combination of parameters: 



®Note that £J(£(S't)|5't = 5t) = £'[£'[£(S't)|'S't = st, S = 5]|St == 5t] = 0. 



( ag — bi 

a§ — 62 
” ^ 5-1 



®s-i ~ ^2 
®5-l “ ^s-i 









ai 



-61 \ 



rs^ 






where in a cell denotes the absence of data on the entry. We also know as a consequence of E (/) = 0 
that if we define Pj = Pr(5 = j) 

s 

^ajPj=0. (10) 

Letting Pj be Pr(5'x = j), we also obtain 

s 

^bjPj=0. ( 11 ) 

j=l 

Taking a weighted sum across row 1 of the matrix, we identify 61 since 



^ V ^ji^j ^1) — ^ 



j=i 



j=i 



s 



61 Pi = -61 



by ( 10 ) and the fact that Yl^=i Pj ~ Going across the first row element by element, we obtain aj,j = 
I,. .. ,S. Going down the first column, we obtain the remaining bj,j = 1 , . . . , 5 — 1 . Using ( 11 ) we identify 
b§. Thus the model is fuUy identified except for A (5) . 

Attractive as these resiolts are, there are three reasons to be cautious about estimates derived from this 
identification strategy; (a) Age effects (maturation effects) may affect test scores independently of any 
effect of schooling because persons may acquire life experiences that raise their test scores independently 
of their schooling at the date of the test. Our procedure has to be modified to distinguish age effects 
from schooling effects, (b) Persons start school at different ages. Less able people (those with lower /) 
may start school at later ages, making an assmnption of an identical school starting age for aU persons 
problematic. Simply conditioning on the starting age N to solve this problem is not satisfactory given 
its fikely dependence on /. (c) In principle there might be a separate N effect on test scores apart from 
the dependence of on / if there are discouragement effects (students older than their classmates may 
feel inferior and be less motivated). The confluence of an endogenous N and independent age effects is 
problematic. 

Modeling the starting age N along with the schooling level S does not pose any conceptually new 
problem as long as there are no age at test effects. We can use different {St, N), (5, N) pairs and replace 
{St, S) in the preceding analysis. Data cells may thin out but the previous identification strategy works. 

Allowing for age in addition to N produces a fundamental identification problem if we maintain the 
“no return to school for dropouts” assumption. Observe that by definition St = min {A — N, S}, so that 
S, St and A — N caimot be freely varied. A more general model that incorporates age and entry writes 
the test score T as T{A,St,S,N) where A is the age at the test, St is the level of schooling at the 
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date of the test, S is the final level of schoohng attained and N is the age at which the person enters 
school. For simphcity normalize = 0 to be the “normal age” of starting school, ff A and St both affect 
the measmred test score directly, while S and N do not directly affect the test score but potentially are 
stochastically dependent on latent ability /, we may write 

T{A, St, S, N) = fi{A, St) + X{A, St)J + e{A, St).^ (12) 



Then conditioning on observable {A, St, S, N) we obtain 

E{T{A, St, S, N)\A = a,ST = st,S = s,N = n) = fi{a, st) + A(a, ST)E{f\S = s,N = n) (13) 

where we assume e{A, St) is independent of all other variables. Observe that when St < S, fixing N and 
St determines A: 

A = St + N (14) 

T his exact linear dependence does not apply to persons with completed schooling (5 t = 5). In that 
subpopulation, S and St cannot be independently varied so the control function identification strategy 
previously developed breaks down but the exact hnear dependence (14) does not hold so that we can 
independently vary A and St = S for each N. ff we parameterize fj, {A, St) and A {A, St), can identify 
separate effects of age and schoohng at the test date.^*’ With sufficient structmre, we can extrapolate 
/i {A, St) and A {A, St) back to ages and schooling levels at schooling levels St < S. This method is 
pmrsued in Section 4. 

In om: data, there are effectively two starting ages N G jV = {0, 1}. Given om: “no return to school 
for dropouts” assmnption, in the sample S > St, people who start school one year later are also one 
year older at schooling level St than are people who start school at a normal age. We cannot identify a 
separate N effect from an A effect. 

If we condition on each value oi N = n and repeat the preceding identification argmnent for each N, 
we identify fi{sT,n) and X{sT,n) ,s,s' > st from the sample 5 > 5 t by conditioning on St = st and 
iV = n in (8). When = 0, 5 t = ^4 if schoohng is incomplete at the test date (5 > St)- We identify 
a joint schoohng and age effect for each N. When N = 1, we can identify the effect of being one year 
older on /i (st) and X{st) for samples in which S > St- This effect is indistinguishable from the effect of 
starting one year later. We can test for an age (at test or entry) effect by testing fi {st, 0) = /i {st, 1) and 
A {st, 0) = a {st, 1)-^^ This argument can be modified in a straightforward way to account for the case of 
more than two elements in Af. 

While intuitively appealing, the method based on control functions does not exploit ah of the infor- 
mation in the S = St sample. Data where S = St is the more commonly occurring case. It is not 

iV causally affects the test, then (12) is modified to read T {A, St, S, N) = fj, {A, St, N) + X {A, St, N') f+6 (A, St, N) . 

^*^Thus with fJ.(A, St) = fii-A) + !^ 2 (“^t) and A(A, St) — + ^ 2 (‘^r) we can break these linear dependencies. Multi- 

plicative versions can work eis well. This is the strategy used in section 4 to achieve identification of these effects. See the 
closely related identification analysis of Heckman and Vytlacil (2001). 

'^Observe that for persons for whom S = St, age at test is not restricted by (14). Thus we can in principle identify age 
effects when we use S = St observations, but we cannot use the control function method developed in this section to solve 
the selection problem. 
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straightforward to use the control function method to accormt for ceiling effects. When S = St, it is 
possible in principle to isolate separate A, N and S effects. We now present a different method designed 
to analyze the entire sample more fully. 



4 A Discrete- Continuous Econometric Model of Schooling and 
Test Scores 

T his section develops a more exphcitly structmed semiparametric model that does not rely on special 
features of the NLSY data and that enables us to condition more finely. The model also enables us to hii 
our work to more conventional models of schoohng and wages, and identify separate S, A and N effects. 
Tni t.ia.11y we assume S = St and then we extend the analysis to allow for the case S > St- 

Unlike the control function method developed in Section 3, the method discussed in this section 
requires more than one test. Suppose that we have data on (^2) tests associated with different levels 
of schooling S = s. Array the tests into a vector 

T(s, x) = fj.{s, x) + Q{s, x) 

where the component of Q, Qk{s,x), has a factor structme Qk{s,x) = Xk{s)f + £k{s), k = 
s = 1, ..., 5 like the one used in sections 2 and 3. Exact stochastic specifications are given in section 4.1. 
We use the following notation 
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and Q{s, x) = 














Qk{s 
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We initially work with Q and produce a semiparametric identification theorem for the distribution 
of Q and other variables. Then we identify the distributions of the components of the Q. The X are 
determinants of tests. We assmne Q{s,x) iLA" throughout. We observe T{s,x) only if 5 = s. The 
schooling states in this section can be defined in a sufficiently general way to include different schooling- 
entry ages as different states. Other definitions for the states are possible {e.g. the Cartesian product of 
schooling, entry age, schoohng quality, etc.), so S can be interpreted in a general way. 

In order to account for the endogeneity of schooling, we construct the following model of schooling 
choice, which we adjoin to the system of test scores: 




V{s)^^,{Z) + rj{s), 



= 1 ,..., 5 , 



(15) 
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where V{s) is the utility associated with schoohng level s, and Z is a vector of determinants of utihty. 
We assume that rj = (r7(l), . . . ,^(5)) is absolutely continuous with support TZ^. This joint system of 
test scores and choice equations is a mixed discrete- continuous choice model as in Heckman ( 1974a, b). 
Optimal schooling is s = argmaxa{V(s)}f=i. The Z variables may be state-specific or general. Sufficient 
conditions for nonparametric identifiabihty of versions of this model are available in the literature. We 
present a new analysis. 

We observe T(s, x) for each schooling level conditional on s = s. We assume: 

{Q{s, x), rj) _LL (Z, X), for all s = I,. . . ,S. 

The {Q{s,x),rj) have zero means and finite variances. (A-1) 

{Q{s, x),7]) for all s = 1, ... ,S are absolutely continuous with support . (A-2) 

Under these assumptions, we can write 

Pr(T(s,a;) < t\s = s,X = x,Z = z) (16) 

= 'Px{Q{s,x) <t - p{s,x)\V{s) >V{s'), s' ^ s, s' = l,...,S), s = l,...,5, 

where both t and T(s, x) are vectors. 

Adapting an argument from Heckman and Honore (1990), for each choice s = s we can trace out each 
of the components of p{s,x) over their supports for each corresponding component of t up to intercepts 
which we can obtain by a limit argument presented below. 

In this paper we assume the following functional form for utility. For Z a 1 x J vector of variables 
affecting choices we assume a linear-in-parameters model: 

cpfiZ) = Z7(s). 

We define 

<p.y(Z) = Z(y(s)-j{s')), 

and 

ri{s,s') =ri{s) -ri{s'). 

If the coordinate of 7(5) is zero, the variable does not affect the level of utility. We adopt the nota- 
tional convention that the first coordinate of Z is the intercept. Array the contrasts of the imobservables 
into a vector of length 5 — 1 where the entry 77 (s, s) (= 0) is deleted: 

As a consequence of these assumptions, we may write 

^^Matzkin (1993) and Thompson (1989) consider the special case where utility functions are identical across choices. In 
the linear-in-parameters case, they assume 7 ( 5 ) = 7 . See Cameron and Heckman (1998) for a more general analysis. 

^^The easiest way to see how this argument works is to integrate out all components of T except the For different 
(tjfc,a;) values, we can trace out pairs that keep the left side of (16) constant. (Recall that we know this CDF). Applied 
sequentially, this produces the components of up to constants. 



( 17 ) 



Pr(T(s, x) <t\s = s,X = x,Z = 2) Pr(5 = s|X = x,Z = z) 

= Pr(Q(s,a;) < t - /i(s,x), 77(1, s) < . . ,t7(S,s) < (^,,5(2)) 

s = l,..,5. 

We know the left-hand side of these expressions and seek to determine all of the parameters generating 
the right-hand side including the joint distribution of the rmobservables. We have already estabhshed 
how to identify the components of /i (s, x) up to intercepts. These can be obtained without assuming any 
structure for 7(5), s = 1, . . . , S. 

First consider identification of the test system by a limit argument. We assume that the coordinates of 
the contrast-in-choices vector are “variation free” or more precisely that they are measurably separated, 
so they can be independently varied over their supports: 

Support 

all s = 1, . . . , S', where the components are measurably separated with respect (A-3) 

to each other (“variation free”)}'^ 

This assumption says that the support of the difference in the deterministic portions of the contrasts in 
utihty functions matches the support of the corresponding error terms and that we can independently 
manipulate each argument holding the other arguments fixed. As a consequence of (A-3) and our 
choice of functional forms for (Z) , there exist limit sets Zg for each s = 1, . . . , S such that as Z — > 
Zg^ Pr(s = s|Z = z) — > 1 for s = 1, S'.. These limit sets can be constructed by making coordinates of 
Z arbitrarily large or small. In these limit sets, we can identify 

Pr(<5(s,a;) < t - /i(s,a;)), (18) 

for each s = 1,...,S. Coordinate by coordinate, we can identify the intercepts of p{s,x) since the mean 
of each coordinate of Q{s, x) exists and is known. For each coordinate, we may form fy — pj^{s,x), k = 
1, ..., K, for each fixed s, x. From (18), in each limit set we may identify the joint distribution of Q{s, x) 
fi'om the variation in the fy which traces out the cumulative distribution function of Q{s, x), s = 1, ..., 

'Turning to identification of the choice system, consider choice system s with S' — 1 contrasts 

p(s)-p(£), i = i,...,s-e^s. 

Define the set of variables that appear with nonzero coefficients in the s and i utihty systems by index 
sets on the Z and the associated 7 coefficients: 

Cc,s,e = {j\lj (s) ^ 0 and 7j. (£) ^ 0} 

^^See Florens, Mouchiart and Rolin (1990) for a precise definition of measureable separability. 

^^The supports of both can be bounded by straightforward modifications of the initial assumptions. Then we require that 
the supports match. 

^®We can alternatively use a median zero assumption. 

^^Use of these limit sets raises the possibility that identification is achieved only on null sets. Using a version of the 
argument presented in Aakvik, Heckman and Vytlacil (1999) adapted to this context shows that this possibility is not 
revel ant. 
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where 7j(A:) is the coordinate of the system of utility coefficients associated with the component 
of Z. The variables that are common to all (s,£) pairs, i = I, . . . , S,i s, are associated with the 
subscripts in 

s, 

£=1 



eyts 



Define the set of unique variables (relative to s, i) as those with nonzero coefficients in s or ^ but not both: 

^u,s,e = {jhj (s) = 0 or 7^ {£) = 0 but not both} . 

These coefficients axe unique within the (s, i) pair (in s or i, but not both). Many intermediate cases may 
arise where variables axe common between s and £ but not between s and for various £ and £' values 
{£, £' ^ s). 

Consider the binary choice between s and £. Suppose that (A-3) is satisfied. In paxticulax suppose that 
for all choices (s, £ 7^ £') apart froni s and £ there axe variables with zero coefficients in 7 (s) and 7 (£) 
with nonzero coefficients in 7 (£') that have full support in 7Z. This produces (A-3) given our assumed 
functional form for utility. The following explicit exclusion condition produces identification: 



There are nonempty sets of indices 

= {j I j > ^ Be, 3/, 3 ^ Bu,s,e,j € ^ ,t")} for all £',£" 7^ s,£. 

Thus for some 7^ {£') , 7^ {£") , and j > I, with zero coefficients in s and £, the support 
of the associated Zj is TZ, for all £',£" = 1, . . . , 5, £' , £" ^ s,£. 



(A-3’) 



Setting these variables to limit values, we obtain a limit binary choice model 

Pr (V (s) >V{()\Z)= Fm,f, ( 

where cr(s,£) = Var(rj(s) —r}{£)) and y{s,£) = By an argument due. to Manski (1988), if we 

assiime that 



we can identify 



and either 



((T(s,e))i 
Z € is of full rankf^ 

Ij (s) - 7j {B) 



(A-4) 



(cr(s,£))5 

7i (s) Ij {B) 

— r or 



3 ^ Bc,s,i 



1 5 

2 



3 ^ 



{a{s,£)Y ffi{s,£)y- 

for variables excluded from s or ^ (but not both). By virtue of (A-3), we can identify the marginal 
distribution of 77 (s, ^) = rj{s) — rj{£) up to scale. The mean of this distribution is assinned to be zero. 

Clearly this is a sufficient condition. We only need to have the components of Z with nonzero coefficients possessing 
full rank, i.e., the components of {j | j 6 {Cc,s,e U £«,«,£)} • 
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This allows us to identify the intercept of the s, I contrast. In addition, we can identify the marginal 
distribution of rj{s) — rj (£) up to scale, Frj(g/),s = 1,. . . ,S,i = I,. . . ,S,i s. 

We can repeat this argument for each utility contrast (s with i' ^ t), and identify either contrasts in 
parameters (for those common across aU utility contrasts) or unique parameters. Using parameters that 
are unique across the Cu,s,t sets for various s values we can identify ratios of variances from ratios of utility 
contrasts. For example, suppose that 7^ (s) = 0 while at the same time for various i values, 7^ (£) ^ 0. 
At the same time suppose 7^- (s') = 0 but 7^ {t} ^ 0 then we can identify 

crjs t ) ^ (s 1 ^) ^ 

“ [a(s,£) ■ 

We can repeat this argiunent for all s = 1,..'. ,5;^ = 1,...,5 to identify different combinations of 
parameters. Depending on the various configurations of Cu,s,e, Fc,a',e, s ^ s\ I = 1 , . . . , ^ s or s' 

respectively, we can identify different ratios of variances. 

Exclusions of the type just utilized are not strictly required to identify the model. As noted by Cameron 
and Heckman (1998) and extended by Aakvik, Heckman and Vytlacil (1999), the choice model can be 
identified with no exclusions if the contrast vectors are linearly independent: 

[7(s) — 7 (^)]f£=i is of full rank, and the number of continuous Z variables (A-5) 

with support in TZis S — lor greater. 



Assumption (A-5) constitutes an alternative to identification by exclusion. The essential idea in this 
argument is that we can fix each contrast and vary the others (off to hmit values), achieving a limit 
binary choice model. In this case (imder A-4), we can obtain identification of the marginals Ffj(a^e), 
s = 1, . . . ,S; i = I, . . . , S,i ^ s and the normalized contrasts 

e = i = 

a{s,i)^ 

However in this case, without exclusions, we cannot identify the ratios of variances obtained with exclu- 
sions. For details of this argument see Cameron and Heckman (1998) and Aakvik, Heckman and Vytlacil 
(1999). 



From exclusion restrictions or rank conditions on the coefficients of contrast vectors in utihties, we can 
obtain identification of the choice system and the utihty contrasts up to scale. We state a more general 
result for the joint choice-test score system. We can identify the full joint distribution of (Q (5,2:) >^(3)) 
under the following assumption: 



Support ( [(^3,1 (Z) , ifa,2{Z) , • • • , <Pa,a-li^) 

s = 1, ... ,S, an assumption that the components are measurably separated 
( “variation free” ) with respect to each other. 



S-l+K 



(A-6) 



This is an assumption guarantees that we can vary the coordinates of the {ips,a' (^) j /^) fr^^ly. We can 
obtain the (2) (up to scale) using either exclusions or rank conditions. Exploiting this assumption, 
we obtain the following theorem. 



ERIC 



15 



Theorem 1. Under assumptions (A-l)-(A-4) o-nd (A-6), /x(s,x), 7 (s) — 7(^) (up to scale [cr(s,^)p),s = 
1, ...5, ^ = 1, 5 and the joint distributions of {Q{s,x),tj^^^) (the. second coordinate up to scale) s = 
are identified. 



Proof. We have already established identification of pi^{s,x),k = the marginal 

distribution of 77 ( 5 , 5 ') up to scale and joint distributions of Q(s,x). Under (A-6), we can vary each 
component of 3 /( 2 ) and p ( 5 , x) for each s' = 1, ..S, s ^ s', holding the other components fixed. For all 
possible values of upper limits, we can trace out the joint distribution of (Q(5,x),?7(3)) nonparametricaUy. 
We can do this for all 5.H 









,i) 



With exclusion restrictions we can improve on Theorem 1 by identifying ratios of the scale 
for some £ and 5 , s' as previously discussed.Note that either (A-3)' or (A-5) can be used to implement 
(A-3) but (A-3) is the key condition. 

This proof can be adapted to the case where T are indicator functions of latent variables using the 
argument in Cameiro, Hansen and Heckman (2003). Thus we can nonparametricaUy identify the distribu- 
tion of the imobservables generating choices and test scores. In addition we can nonparametricaUy identify 
the p (x) and the contrasts in utilities up to scale. We next tmn to a factor analysis of the distributions 
of imobservables. 



4.1 Factor Models 

In this paper we assume that the error term in the utihties has a one-factor specification,^^ 

ri{s) = a{s)f + u{s), 5 = 1,..., 5. (19) 

Define the 1x5 vector u as 

u = (u(l) , ..,u (5)) 

where the u{s) are mutuaUy independent. We now assume K >2 test scores at each schooling level with 
a factor structure Qk{s, x) = Xk{s)f -t- £fc(s), k = 1, . . . K , so test scores can be written as 

Tk{s) = /7fc(s) + Afc(5)/ -t- ek{s) k = 1,. . . K. (20) 

The /Xfc(5) may be functions of X. For the rest of this section, we keep dependence on X imphcit for the 
sake of notational simphcity. Array these K tests into a vector equation system for each schooling level 
5 : 

T{s) = p{s) + X{s)f + e{s), 5=1,..., 5 (21) 

where T(5) = {Ti{s), ...,Tk{s)) ,p{s) = {pi{s), P k{s)), X{s) = (Ai(5), ..., A^(5)) , and 5 ( 5 ) = {ei{s), ...,sk{s)) . 
We assume that the components of e{s) are mutually independent within and across each s and are inde- 
pendent of /. 

^^Heckman (1981) and McFadden (1984) use factor structure error terms for discrete choice models. We extend their 
models to accomodate both discrete and continuous random variables. 
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We assume for the factor structure model 



■> 



Independence for the full model: {X, Z)AL {f,u, e{s )) ; fALulLe (s) , s = 1 , . . . , 5. (A-7) 



Error terms u{s)for the choice model are mutually independent and have (A- 8 ) 

Var {u (s)) = ( 7 ^ (s) , s = 1 , . . . , 5. 



Some normalizations are needed for identification of the choice model. One possible normalization is 
(s) = 1 . Other normalizations are possible and are developed below. 

The input for the factor analysis is the joint distribution of the unobservables produced fi'om Theorem 
1 . Since we can only identify contrasts in latent utility levels, there are S systems with K tests each and 
S — 1 utility-normalized contrasts. 

The utihty contrasts and the test scores form S systems of ii' -f- 5 — 1 random variables to which 
standard factor analysis (e.g. Anderson and Rubin, 1956) can be applied. Initially we assume no exclusion 
restrictions so that ratios of variance of r){s)—r] {£) and of rj (s') —rj {£) are not known, (s 7 ^ s') . We develop 
the case of exclusion restrictions at the end of this section. Under these definitions and normalizations, 
we obtain fi'om (19) and (20) the following system of covariances for each system s = 1, . . . , 5 

a (s, s') = Var (77 (s, s')) = {a (s) — a (s'))^ -f- (s) -I- (s') (22) 

Cov(t){ 3 , 3 ')Ms,s")) _ iai3)-a(3'))(a(3)-a(3"))aj+a'^(3) 

< 7 ( 3 , 3 ')^ a{s, 3 ")^ (23) 

s = 1 , . . . , 5, s 7 ^ s', s". 

Recalling that Qk{s,x) = Afc(s)/ -I- Sk{s), we obtain 

Cov(Qk(a),'n(^, 3 ')) Afc(a)(a(a)-g(y))g^ 

a(s,3')i _ <r(3,3')i (24) 

s' = 1, . . . , 5, s' 7 ^ s, k = 1 ,. . . ,K 

Cov{Qk{s),Qk<{s)) = Xk{s)Xk<{s)aj k^k'. (25) 

The left hand sides of (23), (24) and (25) are known as a consequence of Theorem 1 . If we make one 
normalization, e.g. Ai(l) = 1 , and if the conditions of Theorem 1 apply we can identify all of the contrasts 
a{ 3 )-a( 3 ) ^ 5 ' — 5 ^ the factor loadings Afc (s) , s = 1 , . . . , 5, A: = 1 , . . . , A', and 

ct(s,s')z 

a“j, provided that K >2 and X -f- 5 — 1 > 3. 

To see this, suppose s = 1. Prom the system (24) with s = 1, we may form the ratios 



Cov{Qk{l),v{l,s')) 

Cov{Qi{l),r]{l,s')) 



= Afc(l) k = l,...,K. 



Prom (25), for s = 1 we can obtain a'j since we know Afc(l) and Afc'(l), for aR k,k' = 1 , ..., K assuming 
one normalization. Prom (24), given Afc(l) and a'j we can obtain s' = 1 , . . . , 5. In this analysis 

we assume that Afc(s) 7 ^ 0, A: = 1 , . . . , A', s = 1 , . . . , 5.^*^ 

this is not so then the effective dimension of the test system is reduced to the number of tests with nonzero factor 
loadings. A comparable analysis applies to the utility system. 
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Tur nin g to the system s = 2, armed with a'j, we can identify all factor loadings At (2), A: = 1,. . . ,K, 
from (24) for s = 2, using our knowledge of — , and a'j. From (23), for s = 2, s' 7 ^ 1 , we can identify 

s> = 3 ^ X. By the same line of reasoning, we can identify all of the Xk{s),k = 1 , ..., K,s = 

<t(s,s')_2 



Using (23), we can identify 

(s) _ Cov {r]{s,s') ,r]{s,s")) _ (a (s) - a (s')) (a (s) - a (s")) aj 

cr(s, s ')2 cr(s, s ")2 



(26) 



a (s, s') 2 a (s, s") ^ a (s, s') ^ cr (s, s") ^ 

since we know all of the right hand side terms either from data or the preceding argiunent. If we normalize 
a (s, s') = 1 and a (s, s") = 1 for all s, s', we identify cr^ (s) , s = 1, ..., 5.^^ If we normahze cr^ (s) = |, then 

a (s, s') = (a (s) — a (s'))^ + 1 . 



We have identified (by the previous argiunent) 

(a(s) -a(s'))o-/ 



[o- (s, s')] 



= r (s, s') 



where 



|r(s,s')| < 1 . 

Thus this normalization is equivalent to the normalization 

1 



C 7 (s, s') = 



> 1 . 



l-[r(s,s')]^ 

When S + K - 1 < 3, the argument breaks down. Since 5 = 2 is the minimum number of choices for 
the system to be interesting, the breakdown comes with one test and two choices.^^ In this case, the only 
information is in (24) which is 

Ai(l)(a(l) - a(2))o-^ 



[^( 1 , 2 )]^ 

Ai(2)(a(2)-a(l))g^ 

[o-(l,2)]5 

Even normalizing Ai(l) = 1 , we can only identify Ai(2) and the combination of parameters (a(l) — a(2))cr^ 
up to an unknown scale. Additional normalizations must be made to identify these components separately. 

From the joint distribution of (17) we can identify the distribution of / and the distributions of the 
uniqueness (£i(s), ..., Sk{s), and u{s), s = 1, ..., S). To see why, recall that from Kotlarski’s Theorem (1967) 
that if 



= Y + Zi 

X 2 = y + ^2 



Obviously the choice of these particular normalizations is arbitrary. 
^^In that c^lse we lose the information in (23) and (25). 



where Y ALZi ALZ 2 , from the joint distribution of (Xi, X 2 ) we can identify the distributions of Y,Zi,Z 2 
under a mean zero assumption for Zi and Z 2 {E{Zi) = 0]E{Z2) = 0) or for Y {E{Y) = 0).Prom the 
analysis of Theorem 1 we know the joint distribution of T{s), s = 1, k = Using (20) and 

invoking the normalizations previously discussed in the text following equation (26), we can Avrite for 

^fc(s) 7 ^ 0 , 



Tk{s) /^fc('S) _ y, 



k = = 1,. . . ,S. 



Afc(s) Afc(s) 

The expression on the left is known since Afc(s), /^^(s), s = l,...,S,k = are identified by the 

previous argument. Applying Kotlarski’s theorem we can identify the distribution of / nonparametrically 
and the distributions of k = 1, ...,K,s = 1, 5, and hence the distributions oiSk{s),k = 1, :..,K,s = 



1 ,. 



..,5. 

Prom the joint distributions of 



V {s, s') 
{a{s, s’))^ 



I g(s) -a(sQ \ ^ ^ u{s)-u {s') 

V (o-(5,S'))^ / 



s' = 1, S,s' ^ s 



we obtain a two-factor model with the distribution of the first factor (/) known from the preceding analysis 
(as well as its factor loa ding ). u{s) is a second factor that is common across all outcomes based on s- 
contrasts and its factor loa ding is known by the normalizations previously presented, u (s') is independent 
of u{s) and u{s"),s" ^ s,s' and / by assumption. Using deconvolution we can remove / from the 
marginal distributions of \ and apply Kotlarski’s theorem to identify the joint distribution of u (s) 

(<r(s,s'p2 

and u (s') , s' = 1, ..., 5, s' ^ s.^^ The model is strongly overidentified when going across s systems. 

Thus far we have not exploited the information available through exclusion restrictions. Suppose that 
there is at least one variable in V (s) that does not appear in V (s') , s, s' = 1, . . . , 5, s' ^ s, with full 
support {TZ). Then we can identify a {s,£) ,s l,s = 1,. . . ,S,£ = 1, . . . ,S up to a common scale. Thus 
we can identify the a (s) — a (s') up to a common scale for all s, s'. With this information in hand, fewer 
normalizations have to be imposed. Thus we can relax one of the normalizations given under equation 
(20). If the exclusions are only partial, we identify various cr (s, £) up to diflFerent common scales depending 
on the particular exclusions employed. We do not develop this topic further in this paper. 

Recall that we have defined “5” in a general way. It can consist of diflFerent combinations of years of 
schooling completed (5) and age at entry date {N) and other states. Thus we can work with an indicator 
variable D (5, N, ...) that defines schooling states for all S, N, ... combinations as discussed in Section 3. 
This is the model that we est im ate. 



^^Since we know the distribution of / from the analysis of the test score data, we can write the density of — ^ which 

is known by virtue of Theorem 1 as 



9 t ) 



ri{s,s') 






* 9u 






Xcr{s,s'))^ J \{a{s,s^)Y J \{a{s,s'))^ 

where * denotes convolution. We know the first term on the right-hand side (including the factor loadings) . Thus we can form 

the characteristic functions of JHihlJ— a(a)-a(a w using the inversion theorem identify the density of 5 ' = 

(o-(s,s'))2 (o-(s,s0)2 (o-(s,s'))2 

1, ..., 5 , 5 ' ^ s. For each s system, ^( 5 ) constitutes a separate factor apart from /. 
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4.2 Allowing for Tests Taken During Schooling and Age Effects 

The preceding framework is for the analysis of data on completed schooling {S = St)- Let St be schooling 
at the test date. Prom the assumption that persons who drop out do so only once^'^, and rec alling that 
the age at the test date is A, we obtain St = A — N (from (14)) if the individual is still in school at the 
date of the test. Conditioning on N and A, 5 t is a nmnber, not a random variable. 

Assmning that sampling is random with respect to A, we can write the density of St as the convolution 
of N (which we model) and a random variable A independent of N (and all other variables) whose 
distribution we know from the sampling rule. We abstract from any issues of selective smwival since the 
sample is young. 

The density of St conditional on X = x and Z = z is 



g {st \ X = x,Z = z) = {a — St \ X = X , Z = z) P {A = a) 



a— A 



where [A, A] is the range of smvey ages (14-21 in the NLSY data we analyze) and 

1 



p{A = a) = 



A - A+1' 



The density of test scores in the preceding section is conditional on St = S (an event which was 
assumed to hold with probability one). Now we postulate that the event St < S (further schooling after 
the test) may occm. Conditional on St < S, St is a degenerate random variable given A, N. Thus St is 
exogenous given A,N and St < S {i.e. St -U_/ | N,A, St < S). 

We may pool the data on St for St < S with the data on St for St > S using this insight. Details 
about the likelihood for the pooled data are given in Appendix C. 



4.3 Accounting for Ceiling Effects 



In the NLSY data a substantial nmnber of test score observations “hit the ceiling,” i.e., they achieve the 
maximum score on a particular test component. This is docmnented in Table A-3 (see Appendix A). To 
account for these ceiling effects use a latent test score T*^ so that 



Tk{s) = 



n{s) 

Ck 



if ^fc(5)< Ck, 
if T^{s)> Ck, 



where Ck is the maximum attainable score on test component k. Let the latent test score for an individual 
with schooling level s at the test date be 



T;:{s) = X^,{s) + Xk{s)f + ek{s) 
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As previously noted this assumption is supported for schooling through high school in Cameron and Heckman (2001). 
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where X is a set of observed covariates and / is the unobserved factor. Identification with censored random 
variables can be established by a straightforward modification of Theorem 1 given sufficient support on 
X.25 

5 Empirical results 

We now present finding s from estimating the joint schooling and test score model on the NLSY data 
discussed in Appendix A. We consider fom completed schoohng groups: high school dropouts, high 

school graduates, individuals with some college, and fom-year college graduates. We group GEDs with 
high school dropouts.^*^ We group associate’s degrees (junior college graduates) with some college. In 
addition we group respondents into two categories by age at entry into schooling. Let A = 0 if an 
individual began schoohng at age 6 or earher; let A = 1 otherwise. We estimate a choice model with 
4x2 = 8 potential outcomes (combinations of completed schooling and age at entry). 

Over two-fifths of the sample (870 individuals, or 42.11% of the sample) had yet to complete high 
school as of July-October 1980 when the ASVAB was administered. As a consequence we are able to 
break up this group into three subgroups of schoohng level at the test date - those with nine years of 
schoohng or less (205), those with ten years of schoohng (322), and those with eleven years of schoohng 
including some dropouts with more than eleven reported years of schoohng (342). We are thus able to 
trace out schoohng and abihty effects for six levels of schoohng, including high school graduation and 
cohege attendance. Appendix A describes the featmes of om sample and the variables used to estimate 
the models. 

5.1 Control Function Estimates 

We first present nonparametric estimates from the control function estimators outlined in Section 3. 
Appendix B describes the econometric procedme used to produce the estimates. It is written for the 
specific case analyzed in this paper, with six values of St and fom values of S. 

Tables 1 and 2 present estimates for the simple case analyzed at the beginning of Section 3, where 
we do not control for age effects or endogeneity of entry into schoohng. Table 1 reports estimates of 
the factor loa ding s A(st)- Since the model is overidentified we can compute estimates of A(st) using 

A prototype for this proof is in Carneiro, Hansen and Heckman (2003), who show how to identify a related model under 
the case that analysts only observe 1 (TJ (s) < Ck) or 1 (TJ (s) > Ck ) . Extension to the censored case is straightforward and 
for the sake of brevity is omitted here. 

^®The GED is an exam certification for high school equivalency for those who do not earn the degree the traditional route 
by finishing high school. Our grouping is based on work by Cameron and Heckman (1993). 

^^Of the 1,404 individuals in the “normal/ahead” category, (JV = 0) 1,087 (77.42%) entered school at age 6 and 317 
(22.58%) entered school at an earlier age. Since we model choice of schooling and age at entry jointly, further stratifying into 
3 age-at-entry categories would produce a model with 12 possible choices and some cells would be very small. Specification 
checks suggest that combining the “normal” and “ahead” groups is fairly innocuous. 
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information for different completed schooling groups.^® In Table 1 both the unrestricted estimates and 
estimates obtained by imposing the overidentifying restrictions using a minimum distance approach are 
shown.^^ The x^-statistics do not reject the overidentifying restrictions. Recalling that A(l) has been 
normalized to one, the estimates of the remaining A(st)’s indicate a decreasing effect of latent abdity on 
test scores as schooling at the date of the test increases (the estimates of X{st) axe decreasing with st)- 
Table 2 reports the minimum distance estimates of the intercepts and control functions. Again the x^-test 
fails to reject the overidentifying restrictions imphed by the model. Estimated schoohng effects (for the 
average person, with / = 0) range from 3.61 to 9.02 AFQT points per year of schooling. The estimates 
imply an expected test score function which is roughly linear in schooling. As expected, the estimates of 
the control functions (which are conditional expectations of the factor /) are increasing in schooling. The 
control functions for the different completed schooling categories axe clearly statistically different from 
one another. We identify the scale of / by normalizing A(l) = 1. Thus, any comparisons are conditional 
on this normalization, a feature shared with the structrual estimates reported below. 

We can interpret X{st) [F'[/|5' = s] — E[f\S = s']] as the expected difference in test scores for two 
individuals with the same schoohng at the test date, st, but with different levels of completed schooling. 
The fact that A is declining in st imphes that the test score difference between individuals with different 
completed schooling levels declines with schoohng at the test date. In other words, the test magnifies 
differences in latent abhity at low schoohng levels and dampens differences at higher schoohng levels.^® 

Tables 3 and 4 present nonparametric control function estimates for the case where / depends on the 
age of schoohng entry, E{f\N = n, St = st), but N does not otherwise enter the model. In this case 
we can identify A(5) by differencing test scores conditioiung on fixed St,S = S and varying A/” = 0,1. 
Appendix B.l presents the estimation procedrue used to construct these estimates. Knowing X{S) we can 
identify /u(5). Table 3 reports the loadings estimated using the minimum distance approach. Again we 
fail to reject the overidentifying restrictions. In addition, the pattern of declining estimates with schoohng 
is stih present. The loa ding for cohege A(6) is estimated to be zero. This is most hkely caused by the 
presence of ce iling effects as this pattern is not foimd to the same extent using the structrual model 
(see next section). Table 4 presents the estimated intercepts and control functions. There is now less 
evidence for the restrictions imphed by the model (the p- value for the x^-t^st is 0.01). However, the 
estimated test score function (assuming / = 0) is qmte similar to the one estimated without entry effects 
- especially druing the high school years before diverging shghtly at the “Some Cohege” level. Estimated 
schoohng effects therefore remain high, between 2.37 and 8.93 AFQT points per year of schoohng. The 
control functions now depend on both schoohng and entry age. As expected the estimates are increasing in 
completed schoohng and entry state (individuals who start at an older age have on average lower cognitive 
abihty). Note, however, that entry state has a much smaller effect for the “Some Cohege” and “Cohege” 
category than for the lower schoohng categories. 

obtain six different estimates of A(2) by comparing different completed schooling groups following the discussion in 
section 3. 

^®Since we can only identify ratios of the A(sx) we have normalized A(l) to one. 

^®This could be due to ceiling effects. The structural model estimates reported in the next section taJces ceiling effects 
into account. 
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Table 5 presents estimated intercepts and control functions for a model allowing direct A/’—effects on 
the test score in addition to controlling for potential dependence of / on A/". As discussed in Section 3, 
estimating a model controlling for both entry age N and age at the test date A requires more structure 
in order to break the fundamental identification problem resulting from the confluence of N and A, The 
estimated intercepts St) are uniformly larger for late-starters (who are older when they take the 
test) than they are for those who begin their schoohng at the normal time.^^ Recall, however, that if there 
are independent age effects, then the difference //(I, St) — ^t) captures those effects as well as any 

discouragement effects. As noted in Section 3 we cannot identify an independent age effect. However, 
we can reject the joint hypothesis of no A and N effects. The evidence points to a much stronger role 
for age (maturation) in influencing test scores than any discouragement effects from being held back as 
people who are older at any schooling level have higher test scores. 

In the next section we present estimates from the structiiral, semi-parametric model discussed in 
Section 4! Taking a structural approach to the problem we can estimate a more general model of schoohng 

^^Note that in order to estimate the model we must restrict /x(5t, 0) = 1) for some St> We report estimates for the 

model imposing the restriction for St = 5. To see why we most impose equality between at least one pair of intercepts, note 
that the moment conditions for this case are (letting f denote the conditional mean of T and c (s, n) = E[f\S = s, iV = n]): 

f{s,ST,n) = fi{sT,n)-\-X{sT)c{s,n), (27) 

where c(s,n) = E[/|5 = 5 , AT = n]. In the sample, n = 0, 1 and s = 1, 2, 3,4. This gives us (8 — 1) = 7 control functions 
to estimate since the weighted sum of the control functions is zero. Note that given st and n we have a maximum of four 
conditions for determining /x(stjTi): 

T(5, ST,n) = fji{sT,ri) -h A(5T)c(5,n), 5 = 1, ... ,4. (28) 

How much data is needed to identify the control functions? Suppose we consider only one st value , say = 1, and 
n = 0, 1. This yields eight moment conditions: 

T(5, 1, 0) = /x(l, 0) + A(l)c(s, 0), 5 = 1, . . . , 4, 

T(s, 1, 1) = /i(l, 1) + A(l)c(s, 1), s = 1, . . . , 4. 

Note that under our previous assumptions we had /i(l, 0) = ^(1, 1) = ^(1). If this restriction holds the model is identified, 
since by taking contrasts we can identify the 7 differences c(s, 0) — c(4, 1) and using the sum restriction on the control 
functions we get that ail of the c(s, n) are identified and then the single intercept ^(1) is identified. Recall the argument in 
Section 3. 

If we allow for separate intercepts, /i(l,0) ^ /i(l, 1), the model is no longer identified, since we can now only identify the 
differences c(s, 0) — c(4, 0) and c(s, 1) — c(4, 1). Thus, we can only identify six differences and so we cannot identify the 
control function elements. Note that this problem persists no matter how many st values we use. We can only identify the 
six differences mentioned above. To obtain the required normalization we can restrict fi(sT^'fi = 0) = = 1) for one 

St value. 

Given our “no return to school for dropouts” assumption, people who start school one year later are also one year older 
at schooling level St than are people who start school at a normal age if they have not completed their schooling at the test 
date (5 = 5t). However, in order to estimate the model we must include individuals with completed schooling at the test 
date (5 = St) in order to observe the boundary group 5=1. Conditioning on the entire sample means that varying N is 
not equivalent to varying A and, even in the absence of N effects, we cannot identify an independent age effect using this 
procedure. 
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and test scores allowing for both age effects, endogenous entry into schooling and testing ceiling effects. 
We can also condition on covariates such as family background and local labor market variables which 
may influence the choice of schooling. However, the estimates from the control function approach are in 
broad agreement with estimates from the structural model. 

5.2 Estimates Prom the Structural Model 

We now present empirical results from the structural model of schoohng and test scores presented in Section 
4. We use Bayesian MCMC methods to estimate the sample likelihood for the model of Section 4. Details 
of the algorithm are presented in Appendix C. Our use of Bayesian methods is only a computational 
convenience. Under our identifying assvunptions, the priors we use are asymptotically irrelevant. Our 
identification analysis is strictly classical. 

Table 6 reports exclusion and inclusion restrictions for each equation of the structural model. The com- 
mon variables in the choice system (included in all but the “coUege/behind” index) are family background 
- urban status, broken home status, number of siblings, southern dummy, mother’s and father’s education, 
family income - and birth cohort dummies. Choice-specific variables are: local wage and unemployment 
rate for high school dropouts, high school graduates, and those with some college for equations with the 
corresponding schooling groups, and tuition and distance to four-year college in the college equations. 
Quarter-of-birth dummies are included in the “behind” equations. We invoke identification assumption 
(A-5) because we lack exclusions. We adopt hnear-in-parameters utfiity functions. 

We parameterize the latent test score equations as follows: 

T^(s) = XPf,(s) + Afc(s)/ -I- 6k{s), k = s = 1, .., St 

where A" is a set of observed covariates, including age, which we restrict to have a linear effect. Covariates 
in the test score equations include family background variables, age (as of December 31, 1980), and 
a dummy variable for in-school status at the test date. We estimate twenty-four test equations: four 

equations for each AFQT component (Word Knowledge, Paragraph Comprehension, Arithmetic Reasoning 
and Mathematics Knowledge) for each of six levels of schoohng at the test date.^^ 

The computational algorithm used to estimate the model parameters is discussed in detail in Appendix 
C. Due to space constraints detailed parameter estimates of the models are posted at 
http://home.uchicago.edu/~kjmullen/Schooling_JOE.htm. 

5.2.1 Model Fit 

We first discuss the fit of the estimated model to the data. Tables 7 and 8 describe the fit of the model to 
the data for the schooling choice and test systems, respectively. The fit reported in Table 7 is quite good 
both overall and in partitions of the data on selected covariates. Figures l(a)-(f) plot the fitted AFQT 

addition to the covariates above we included a dummy variable in the test score equations for having completed 
strictly less than 9 years of school to allow for possible heterogeneity in the grade school and ninth grade composite group; 
the coefficient on this dummy was insignificant for all tests. 
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test score distribution against the actual empirical CDF for each schooling group. We pass goodness of 
fit tests at conventional levels of significance for most groups though the figmes reveal a shght tendency 
to vmderpredict scores for the lowest schooling category and to overpredict test scores for the highest two 
schooling groups. The goodness of fit statistics are reported in Table 8. The fit is worst for the most 
heterogeneous groups, “ninth grade or less” and “some college.” In fact, the poor fit in the “some college” 
category causes us to fail the overall test of fit.^^ Excluding that group we would pass the overall test. 

5.2.2 Estimated Cognitive Ability Distribution 

Figiure 2 displays the estimated latent abihty or factor distribution plotted against “residualized AFQT” 
(constructed by running an ordinary least squares- regression of standardized AFQT score on family 
backgrormd and cohort dummies). Recall that the location and scale of the latent abihty distribution 
must be set since they are not identified in the model. This is a standard result in factor analysis. 
Recall that we set the location by constraining the unconditional mean of the factor to be zero (note the 
residualized AFQT distribution also has mean zero by construction). The scale is set by a normalization 
in one of the test score equations. Specifically, we set to 1 the coefficient on the factor in the equation 
for the Word Knowledge test component (standardized to have within-sample mean 0 and variance 1) 
estimated for individuals who had completed eleven years of schoohng at the test date.^^ 

The estimated factor density is not normal and closely tracks but does not completely resemble the 
conventional residualized AFQT density. Residualized AFQT computed by OLS (not accounting for 
schoohng or selection effects) is an imperfect measure of cognitive abihty. While the mean of the factor is 
fixed to 0, the estimated median, 0.1158, is positive, so that more than half of the population has above 
average abihty. However the estimated range of the factor distribution is skewed negative: a person who 
is at the 2.5th percentile in abihty is more than half a standard deviation further away from the average 
(at —1.4846) than a person at the 97.5th percentile (with abihty 1.1131). 

5.2.3 Allowing for Age and Endogenous Entry Dates 

By using the sample 5 = S't’, we break the dependence between A and N given by (14). We parameterize 
age effects on test scores by assuming 

A(A,5) = A(5)^® 

/i(A,5) = ^,(S)A + ^,(S) 

where /3i (S) and /32 (S) are unrestricted functions of S. In this paper we exphcitly model the relationship 
between entry date N and latent abihty /. The model specifies a joint S x N space. How important is it 
to account for endogeneity of A ? Figure 3 plots the distributions of latent abihty / conditional on entry 
status N. Note that individuals who are behind their peers on average have lower latent abihty than their 

^‘‘The P value for the overall fit of the model excluding some college is 0.1050. 

^®The estimated stand^u•d deviation of the factor is 0.7027. 

Attempts to estimate age-dependent A led to very imprecise estimates. 



counterparts who are age-grade normal or ahead, especially those who do not attend any college. Failing 
to correct for endogenous entry effects would lead us to imderestimate the effect of cognitive abihty on 
dropping out versus graduating high school, especially at lower levels.^^ 

Figrue 4 plots the estimated age at test effects for each G A/” = {0, 1} group. The estimated 
matvuation effects are roughly constant across ages. As in the control function estimates, the net effect 
of age at the test on measrued test scores is positive. 

5.2.4 Schooling Behavior 

The structvual approach models the schooling decision exphcitly. Thus we can estimate the relationship 
between cognitive ability / and schooling choice. Correcting for endogenous schoohng effects on AFQT 
turns out to have some interesting imphcations for inference about the effects of ability on schooling 
choice. Figure 5 (a)-(b) plots schooling choice probabihties as a function of observed abihty for a simple 
multinomial choice model that conditions on residuahzed AFQT (z.e., observed minus predicted AFQT, 
where predicted AFQT is formed by regressing standardized raw AFQT score on family background 
characteristics and cohort dmnmies, not correcting for schoohng effects), stratified by entry age. This 
model assumes that residuahzed AFQT is a perfect proxy for latent abihty so Tk{s)—X^ = ai-\-a 2 f 
where 011 , 0:2 are constants. In this conventional specification, measrued abihty is a strong predictor of 
schoohng decisions, especiahy high school dropout and cohege-going decisions. For those who are age- 
grade normal or ahead of their age-peers in their schoohng, the probabihty of dropping out conditional 
on a low residuahzed AFQT score {e.g., a score of -1.8 at the 2.5th percentile) is about 31.9% compared 
to the population rate of 10.7%. For individuals who are behind their peers the difference is even more 
pronounced: the predicted probabihty of dropping out conditional on a score at the 2.5th percentile is 
57.7%, compared with population a rate 27.2%. At the upper end of the AFQT distribution, the estimated 
probabihties of graduating cohege with an AFQT score of 1.46 at the 97.5th percentile are 71.2% and 
67.5% for the “normal/ahead” and “behind” groups, respectively. The corresponding population rates are 
33.9% and 21.6%, respectively. 

Figrues 6 (a)-(b) plot estimates from the model of Section 4. For small values of the factor / at 
the 2.5th percentile corresponding to the estimated factor distribution) the probabihties of dropping out 
of high school (21.2% and 42.8%) are almost 11 and 15 percentage points smaller than the comparable 
probabihties estimated by the simple model above. Larger factor values imply cohege probabihties of 
65.8% and 56.9% which are just pver 10 percentage points lower than those estimates produced by the 
model using residuahzed AFQT. 

Aside from measiuement error bias, there is a fundamental econometric problem associated with 
estimating a schoohng choice model which conditions on a measiue of abihty which has been constructed 
without accounting for reverse causation (i.e., that schoohng affects measrued abihty). Ignoring the 

^^Note that in the structural model we do not allow for direct A^-effects on test scores, which would increase the dimension 
of the test score system to 2 x 4 x 6 = 48 equations. We do, however, allow for linear age effects in the means (these are 
graphed in Figure 4). The evidence from the control function approach outlined in Section 5.1 supports the idea that age 
effects are more important than “late starter” effects. 
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simultaneity problem leads to substantial overstatement of the role of cognitive ability in explaining 
schooling decisions. 

Figines 7 and 8 demonstrate the importance of this point. Figure 7 (a)-(b) plots the estimated 
residualized AFQT densities conditional on completed schooling, stratified by entry status. The estimated 
densities ^e standardized so that the unconditional density has variance 1 to facilitate comparison with 
the structural model estimates. A key feature to note is the degree of separation in the conditional 
“ability” distributions. Faihng to correct for schooling effects on measured abihty leads one to predict 
a strong causal relationship between schooling choice and cognitive ability. Figures 8 (a)-(b) show the 
estimated factor distributions (again, standardized) est im ated from our corrected model conditional on 
completed schooling and entry date. In the corrected model, the cognitive abihty distributions are much 
less stratified. 

Taken together, these findings suggest that the previous literature has overstated the role of latent 
cognitive abihty on explaining schoohng. This leaves more room for non-cognitive factors. (See the 
evidence on the importance of noncognitive factors in Heckman and Rubinstein, 2001) 

5.2.5 Effect of Ability on AFQT 

An important feature of the structural approach developed in this paper which distinguishes it from 
conventional models of determinants of achievement test scores is that we can estimate the effects of 
latent abihty on manifest test scores in addition to causal schoohng effects on test scores. The usual 
approach treats unobserved abihty as a nmsance variable that biases the parameter of interest, the effect 
of schoo ling on measured abihty, and focuses on ways to eliminate its influence. We model latent abihty 
and its influence on schoohng exphcitly, allowing us to investigate the relationship between latent abihty 
and measured abihty. 

Figures 9(a)-(d) show the marginal effect of a standard deviation increase in latent abihty on each of 
the four AFQT test components for different schoohng levels at the test date.^*^ In several cases the gap 
in the expected test score between two persons one standard deviation apart in intelhgence (unless one of 
them is at or near the maximum score) is qmte large, up to 20% of the total number of points possible 
on the test. Schoohng affects verbal and mathematical skhls differently. Moreover, we can see that while 
the ma.rgiTia.l effects of abihty decrease with additional schoohng for the two verbal test components, the 
marginal effects of abihty on the mathematics components are roughly constant or shghtly increasing over 
schoohng levels. 

Figure 10 shows that the marginal effect of a standard deviation increase in cognitive abihty, aggregated 
over the four test components, ranges from 12.5 to 17.8 AFQT points, or about 12 to 17% of the maximum 
possible score of 105. The effect increases initially from ninth to tenth grade where it reaches its peak 

^*In Figure 9(a) the collapse of the confidence bands at 11 years of schooling is due to a normalization, i.e., setting 
X{St = 3) = 1 in an equation where standardized Word Knowledge (WK) is the dependent variable. Multiplying A by the 
standard deviation of the original WK test (7.0327) converts the effect into test score points. Multiplying further by the 
standard devation of the latent factor (0.7027) gives the effect in points on the WK test of increasing the latent ability by 
one standard deviation. 
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and appears to fall thereafter. In a formal test we cannot reject the hypothesis that the marginal effect 
of latent ability on AFQT score is the same across all schooling levels at the 5% significance level. Recall' 
that this is the imphcit assumption used by 'Winship and Korenman (1997), Hermstein and Murray (1994) 
and Neal and Johnson (1996). 

5.2.6 Effect of Schooling on AFQT 

In Figure 1 1 we switch perspectives and show the effect of schooling on test scores for fixed levels of latent 
ability. This figure demonstrates strong effects of schooling on test scores. From grade school to college 
a given individual can expect to improve his performance on the AFQT by about 18 to 31 points (16 to 
29.5 percentage points), depending on his initial ability level. Figure 11 shows that the largest schooling 
effects are fovmd for individuals with very low ability levels. However even with more than 15 years of 
schoohng the test scores of the individuals at the 2.5th percentile do not quite reach the average test score 
that persons at the median achieve with just a ninth education. 

Individuals at the very top of the ability distribution (the 97.5th percentile) are within roughly five 
points of the test score ceihng by 11 years of schooling. The estimated AFQT test score functions are 
roughly parallel across ability levels. In fact the gap in AFQT scores between the 25th and 75th percentiles 
closes only four points (after widening shghtly at first) between ninth grade and the college years. The 
functions are roughly hnear. 

Figure 11 shows that the effects, of schoohng on AFQT are highest during the early high school years 
for all abhity levels, between grades 9 and 11, between 3.5 and 6.6 points on average per year, varying 
by abihty level. After ninth grade schooling effects decrease with latent abihty. The average estimated 
annual schooling effect, varying across abihty levels, is between 3.4 and 4.1 AFQT points (or 3.2 to 3.9 
percentage points). Figure 12 plots estimated schoohng effects with 95% confidence bands for the average 
person (with / = 0), which vary from 1.2 points (transiting from 11th to 12th grade) to 5.4 points (from 
9th to 10th grade), with an average schoohng effect of 3.7 AFQT points (3.5 percentage points). In other 
words, a one year increase in schoohng is associated on average (across schoohng levels) with a 0.16-0.19 
standard deviation increase in the AFQT score across abihty levels; the estimated increase in AFQT score 
per year , of education for the average person (/ = 0) is 0.17 standard deviation. 

5.3 Compgurison of Control Function and Structural Model Results 

Figures 13 and 14 summarize our estimates obtained from using both nonparametric and structural 
approaches. Figure 13 plots the estimated ratios of the factor loadings A(st)/A( 1).^*^ Note that the 
general pattern of declining abihty effects is consistent across all models although it is more pronoimced 
for the nonparametric estimates. 

Figure 14 plots estimated intercepts //(s^) for all models.'*® Again the models are in agreement espe- 
cially for the high school years. The control function estimates (which do not control for other determinants 

*®The structural model estimates of the factor loadings of the four AFQT components are converted into estimated 
marginal effects for overall AFQT score, as in Figure 9. 

‘‘^Recall that the structiu'al model controls for covariate effects. The appropriate comparison to the control function 



of test scores) tend to be steeper than the structural model estimates which control for family background 
variables. The control function estimates range from 2.37 to 9.02 AFQT points while the structural esti- 
mates vary less and tend to be smaller, ranging from 2.79 to 4.2 points on average. Overall, the agreement 
is close. 



6 Summary, Conclusions and Applications to the Returns to 
Schooling 



T his paper develops two methods for estimating the effect of schooling on measured test scores when both 
schooling and measured test scores depend on latent ability. The methods axe apphed to NLSY data, and 
produce estimates that axe in general agreement with each other. 

We ffnd that schooling increases the AFQT score on average between 2.79 and 4.2 points per additional 
year of education. The effect of schooling on test scores is constant across schooling levels. Our estimates 
axe roughly twice as large as .the estimates reported in Herrnstein and Murray (1994). They axe in line with 
the estimates reported in the literature reviewed by Winship and Korenman (1997) who report schooling 
effects on the order of 2 to 4 IQ points, or 2.9 to 5.7 AFQT points. Om analysis shows that schooling 
has small equalizing effects on measured test scores especially for those with low ability and low levels of 
schooling. Om: analysis also demonstrates that the estimated effect of latent cognitive abihty on attending 
school has been overstated in the previous hteratme that does not correct for reverse causality between 
schooling and test scores. 

Our analysis also has important implications for the empirical literatme on the effects of ability and/or 
tests on wages. Suppose that 

lnW = ao+aiS + a2/ + ^i(W).“i (29) 

The causal effect of a unit increase in schoohng is ui. A common strategy in the empirical literatmre on 
wage equations is to proxy / by T to avoid ability bias arising from the dependence of / on s. Using (1) 
in equation (29) to solve out for /, we obtain 



In VF = ao + Uis -|- «2 
= ao + ais 



(T(s)-Ms)-eM) 

A(s) 






a2/x(s) ^ _ ^£(5). 



A(s) A(s)' 



A(s) 



This is a bad proxy for two reasons: (a) The usual problem that a 2 e(s)/A(s) is correlated with T(s); 
and (b) A novel problem that even if e(s) = 0, so T(s) is a perfect proxy for /, we acquire additional 
s-dependent terms arising from the fact that schooling determines test scores, and the estimated marginal 
effect of schooling on earnings is biased for ai unless /x(s) = and A(s) = A (that is, unless schoohng has 

parameter /i(sr) therefore the expected AFQT score evaluated at / = 0 and fixing covariates at the mean. 

Recall that the scale of / is set by our normalization of a factor loading in the test score meEisurement system. 





no effect on test scores) Thus, we get biased estimates of the causal effect of schooling from the proxy 
even if £ (s) = 0. 

Using estimates of the structural model we construct a measmre of ability /, . correcting for endogenous 

schooling at the test date (as well as correcting for family background and age effects).'*^ Estimates are 
reported in both Tables 9 and 10. We find that substituting our schooling-corrected measure of abihty 
for OLS-residualized AFQT in a regression of log wages on years of schooling, abihty, experience and 
experience squared increases the estimated coefficient on schooling by over 1.5 percentage points, from an 
estimated return of 10.22% to a higher 11.76% in oiu sample of white males. Previously used measiues 
of abihty include the effect of schooling on abihty. Piuging the measiue of abihty for this effect results in 
a substantiahy larger estimated effect of schooling on wages. 

If the true model for wages is instead 

In W = ao + aiS + asT + (30) 

so that the test score directly affects earnings as a signal of productivity (see, e.g., Altonji and Pierret, 



2001 ), the true marginal return to schoohng is 








d\nW 




dT 


(31) 


ds 


= ai - 1 - 0 : 2 -^ 


where 

OS 


(9/i(s) 

ds 


OS 


(32) 



Assuming 0:2 > 0, an approach that ignores the effect of 5 on T understates the total effect of schooling 
on wages because it ignores its indirect effect through measiued abihty. Tests of the relative importance 
of schooling and signals (T) on wages ignore the effect of S on T. An estimated increase in AFQT score 
of 3 to 4 points per year of additional schooling biases downward estimates of the retmn to schooling on 
wages by 1.28 to 1.71 percentage points. Accordingly, in their analysis Altonji and Pierret overstate the 
contribution of signalling to the growth of labor market earnings because they neglect the role of schoohng 
in producing the signal. 

These results, and the results reported in Section 5, show that it is important to address carefuhy 
the problem of endogenous schoohng effects when using measiues of cognitive abihty. Simply proxying 
latent abihty with an available test score in a wage equation does not solve the problem of abihty bias 
when estimating a return to schoohng even if measurement error is zero unless the test score is unrelated 
to schoohng. Similarly, even if the measured test score, as opposed to underlying latent abihty, has a 
causal effect on wages, ignoring schoohng effects wiU lead one to underestimate the effect of schoohng on 
wages. To identify the effects of schoohng on test scores it is necessary to control for the endogeneity of 
schoohng decisions. Otherwise, schoohng effects on abihty are overstated. When we regress the test score 

“*^Take, for example, the simple case where /i(s) = s^, A(s) = A, and e(s) = 0. Then the estimated coefficient on s is 
— /3/A. Including the proxy will lead to downward-biased estimates of ai if /3 > 0 (assuming the test is positively related 
to latent ability /, i.e. A > 0). 

the symbol"" denote a consistent estimate of a model parameter (see Appendix C for the estimation algorithm). 

Then for any individual with characteristics x and schooling at the test date st let / = 
where K is the number of test scores observed. 
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on schooling, we get an average effect of 5.58 AFQT points per year of schooling (see Table 11). Using 
quarter of birth as an instrument, as reported in Table 12, we get a lower effect of 4.51, which is still 
larger tha.n the estimate from the structmral model, although not far from it. Om: approach goes beyond 
the standard IV method to explore the impact of schoohng interventions on persons at different places of 
the latent ability distribution. 
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Appendix A Data 



This paper uses data from the National Longitudinal Smvey of Youth (NLSY) to estimate the joint 
model of schoohng and test scores presented in Section Three. The NLSY is a representative sample of 
American young men and women between the ages of 14 and 21 years of age at the time of the first interview 
in 1979. The NLSY is comprised of three subsamples: (1) a random sample of 6,1H non-institutionafized 
civilian youths; (2) a supplemental sample of 5,295 youths designed to oversample civilian Hispanics, blacks, 
and economically disadvantaged whites; (3) a sample of 1,280 youths who were aged 17-21 as of January 1, 
1979, and who were enlisted in the military as of September 30, 1978. The NLSY collects information on 
parental background, schoohng decisions, labor market experiences, cognitive and noncognitive test scores 
and other behavioral measures on these individuals on an annual basis. 

Our analysis is restricted to a sample of 2,066 white males from the main subsample for whom 
there is inf ormation on schoohng, parental background, and other variables affecting schoohng decisions. 
Parental background may include mother’s and father’s education, family income, number of siblings, 
geographic information such as urban status and region of the country in which the family resides, and 
whether or not the individual comes from a broken home (i.e. non-traditional family). Where information 
on mother’s education, father’s education, and/or family income is missing, we impute values for the 
missing variables. (Exact imputation rules are posted at our website http:/ /home.uchicago.edu/ 
~kjmuUen/Schoohng_JOE.htm) In addition, direct and imphcit (opportunity) costs of schoohng are 
needed. These variables are introduced in the relevant schoohng choice equations. These include tuition, 
distance to school, and local labor market variables such as local wages and unemployment rates (strati- 
fied by completed schoohng level). Distance to nearest four-year cohege is constructed as fohows: if there 
exists a cohege in the cormty where a person resides then distance to nearest cohege is zero; otherwise we 
compute distance in miles to the nearest county with a cohege, measuring distance between two counties 
as the distance between their two centers. This distance is constructed using county of residence at age 
17; for those individuals older than 17 in 1979 we use the county of residence in 1979. Tuition at age 17 
is the average tuition in coheges in county of residence at age 17. If there is no cohege in the county, 
then average tmtion in the state is used instead. Local labor market variables for the cormty of residence 
are gathered from the 5% sample in the 1980 census. We compute local rmemployment rates and average 
local wages for high school dropouts, high school graduates, and individuals with some cohege. We assume 
that the 1980 variables are a close proxy for local labor market conditions in the years in which NLSY 
respondents are assumed to be making the schoohng decisions analyzed in this paper. Appendix tables 
A-1 and A-2 present means and standard deviations for the variables stratified by final schoohng and by 
schoohng at the test date, respectively, and overall. 

In 1980, NLSY participants were administered a series of achievement tests known as the Armed 
Forces Vocational Aptitude Battery (ASVAB). The math and verbal scores of the ASVAB can be aggre- 
gated into a measure called the Armed Forces Qualification Test (AFQT). These include tests of Word 
Knowledge, Paragraph Comprehension, Arithmetic Reasoning and Mathematics Knowledge. In our ap- 
phcation AFQT is constructed as the sum of these four tests. Appendix table A-3 presents the number of 
questions, allotted completion time, and maximum possible score for each test. In addition, the fraction 
of individuals who attain the maximum possible score on each test are presented, overall and by schoohng 
group at the test date. Accounting for top censoring by modehng the AFQT distribution as the sum of 
censored-normal subcomponents is empirically relevant: while only seven individuals out of 2,066 (less 

than half of one percent) achieve the maximum possible AFQT score (by achieving the maximum score on 
all four test components), 19% of the sample (391 individuals) attain the maximiun score on at least one 



of the four tests. The table reveals that accounting for top censoring is more important for people with 
higher levels of schoo ling at the time of the test; in some cases twenty to thirty percent of the individuals 
in those groups are censored on one or more components. 

The NLSY contains longitudinal data on highest grade completed, year last enrolled in school (if 
not currently enrolled), high school degree or equivalency status, type of degree (diploma or GED) for 
the years 1979-2000 as well as highest degree attained and year highest degree attained after 1988. Final 
schoo ling categories were constructed primarily using degree information from last year observed provided 
that the respondent was 21 or older if the final state was coded as high school dropout or high school 
graduate or that the respondent was 25 or older if the individual attended some college. GED recipients 
were classified as high school dropouts. For those individuals without specific degree information, the 
highest grade completed variable was used. For the remaining 2% of the sample, the age restriction was 
relaxed provided the last year the respondent was enrolled was two years prior to last observed schooling 
state. The age restriction was placed to ensure that individuals who were actually censored were not 
mistakenly included in the sample; for example an individual who dropped out of the sample at age 
eighteen with a high school degree may have gone on to attend some college or complete a foru-year degree 
and should not be coded as a high school graduate. In addition, 53 cases were discarded from the sample 
due to inconsistent schooling history or lack of sufficient information to conclude schooling status (final or 
at the test date, see below). 

SchooUng level and eiuolhnent status at the test date were constructed as follows. The ASVAB was 
administered during July-October 1980. Respondents were interviewed during January-August 1980 and 
again in January-July 1981. Note also that the NLSY constructs a measure of schooUng and enrollment 
status as of May 1 of each survey year. Since the academic year commonly ends in J\me (May for college) , 
individuals typically advance to a higher completed grade level in May/ June. We use highest grade 
completed and enrollment status as reported in the 1980 survey as measures of schooling and enrollment 
at the test date if the interview was conducted during July-August 1980, otherwise we use the variables 
reported in 1981 if the survey was conducted during January-April 1981. For those remaining we use 
the NLSY-constructed variables for May 1, 1981. We re-coded schooUng state at the test date for 32 
individuals to be compatible with final schooling state (mostly changing highest grade completed at the 
test date to 11 for high school dropouts). For the remaining 1% of the sample we used schooling and 
enrollment histories to come up with plausible categories for schooling at the test date. 

In addition to schooUng categories, measures of age-at-entry group were constructed. For those 
individuals who finished school before 1979 the survey asks the date at which they were last enroUed and 
the highest grade they had completed at that date. Recall that we assume continuous schooling profiles 
so that there are no skips or breaks in schooling from age at initial entry forward. We constructed our 
measrue of age at initial entry date as foUows. For those individuals enroUed in school in 1979, we let age 
at initial entry date equal years of schooling completed in 1979 minus age in 1979. For those individuals 
who had finished school prior to 1979 we made the same calculate using highest grade completed and 
age at last date of eiuolhnent. In our empirical work we constructed two categories of endogenous entry 
status: “behind” if age at initial entry is greater than 6 years (the median age of entry), “normal” if age 
at initial entry is 6 years or less. 
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Appendix B Estimation Procedure for Control Function Model 

Consider the case in which we assume no age effects and random entry into schooling. We group individuals 
into six categories of schooling at test date and four categories of completed schooling. The combinations 
of schoohng at test date and final schoohng are represented by the matrix 



( ail ^12 ^13 ^14 ^ 

a2i a22 ^24 

^^31 ^32 ^^33 ^34 

~ tt42 a43 tt44 

~ ~ a53 tt54 

y ~ ~ ~ a64 y 



(B.l) 



where fly represents the average test score for individuals with level i of schooling at the test date and 
level j of final schooling. A ” means that no observations are available for that cell. 

Since we can only identify ratios of the loadings X{j) ,j = 1, . . . , 6 we normalize A ( 1 ) = 1 . Then we 
have six conditions identifying A (2) : 



A (2) = 


^21 — fl2j 

5 

flu — flij 


i = 2,3,4, 


A (2) = 


fl22 — fl2j 
fll2 — fllj ’ 


i = 3,4. 


A (2) = 


fl23 — fl24 
fll3 — fll4 





We impose the restrictions using a minimum distance framework. Let Y 2 (A) be the vector of the six 
unrestricted estimators. A (2) is estimated by minimizing 



q (A (2)) = (F 2 (A) - iA (2))' W 2 {Y 2 (A) - iA (2)) , 



(B.2) 



where W 2 is the inverse of an estimate of the asymptotic covariance matrix of Y 2 (A). The minimum is 
easily foimd to be 

A (2) = (t'W2t)~^ l'W2Y2 (A) . (B.3) 

For A (3) we have six similar conditions: 



A (3) 
A (3) 
A (3) 



CI 31 

dll ^ij 

Q32 — Q 3 j 

di 2 — dij ’ 

Q33 ~ Q34 

Q'13 — Q'14 



i = 2,3,4, 
i = 3,4. 



with a similar expression for the minimum distance estimator (but with a different weight matrix VF 3 ). 
For A (4) we only have three conditions: 




A (4) 
A (4) 



^42 ~ ^4j 
di 2 — dij ’ 
^43 ~ ^44 
Q'13 ~ (^14 
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Finally, for A (5) there is only one condition: 



A{5) = 



^53 ~ ^54 
^13 ^14 



(B.4) 



Here no miniTnmn distance approach is needed. A (6) is not identified. 

With all loadings estimated we can estimate intercepts /i (1) , . . . ,/i (5) and control functions Ci (j) = 
E [/|5 = i] , i = 1, --M 4; and C 2 {j) = E [/|S't = j] ,j = 1, The model implies the following restrictions: 



and 



aij 


— /i (1) + A (1) Cl (_)) , 


i= 1,2,3, 4, 


a2j 


= /i (2) + A (2) Cl (;■) , 


; = 1,2,3,4, 


^3j 


= /i (3) + A (3) Cl (_)) , 


i=l,2,3,4. 


d^j 


= /i (4) + A (4) Cl (_)) , 


i = 2,3,4, 


d^j 


= /i (5) + A (5) Cl (j) , 


i = 3,4. 


4 

E 


'dijdij 




j-1 

A 


= /i (^) + A (^) C2 (^) , 


II 



j-1 



where riij is the sample size of cell (i,j) . Finally the restriction 



E (j) 

j—i 



= 0 



n 



is imposed where Uj = is the count of individuals with final schooling level j. These conditions imply 

2—1 

24 restrictions on the 15 parameters 0 = /i (1) , ..., /i (5) ; Ci (1) , ..., ci (4) ; C 2 (1) , ..., C 2 (6) . The minimmn 
distance problem is to minimize 



q (9) = (Y (A) - H9)' W (Y (A) - H9 ) , 



(B.5) 



where Y {A) is a linear function of the A elements, if is a known matrix (given estimates of the loadings) 
and W is the inverse of an estimate of covariance matrix of Y [A) . The minimmn distance estimate of 9 is 



9= {H'WHY^ H'WY (A) . 

Extending to the case controlling for endogenous entry into schooling is straightforward. 
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Appendix B.l Allowing for Endogenous Entry 

The combinations of schooling at test date and final schooling are represented by the matrices. 




fan 




r\j 




021 


022 


023 


024 


031 


O32 


033 


034 


rs> 


O42 


043 


044 






053 


054 




r\j 


r\j 


064 / 



/fell 


fel2 


CO 


bu\ 


fe21 


fe22 


fe23 


fe24 


fe31 


bs2 


bs3 


fe34 




fe42 


bi3 


fe44 




r\j 


fes3 


fes4 








fe64/ 



(B.7) 
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where represents average test score for individuals with level i of schooling at test date, level j of final 
schoohng and who started school at a. normal or early age {N = 0). bij represents average test score 
for individuals with level i of schooling at test date, level j of final schooling and who started school late 
(N = 1). A zero means that no observations are available for that ceU.^ 

Since we can only identify ratios of the loadings A(j), j = 1, . . . , 6, we normalize A(l) = 1. 



i = 2,3,4, 
j = 3,4, 



As before the restrictions are imposed in a niinimiim distance framework. 
There are seven conditions for A(3): 



We now have seven conditions identifying A (2): 

021 ~ ^21 



A(2) = 
A(2) = 
A(2) = 
A(2) = 



Oil ~ ^11 ’ 
^21 ~ i>2j 



^11 



^22 



^ 

^12 ~ ^Ij ’ 

^23 ~ ^24 
^13 ~ ^14 



A(3) = 
A(3) = 



O 31 ~ ^31 
On — ^11 ’ 
^31 ~ ^3j 
^11 ~ ^Ij ’ 



A(3) = 
A(3) = 



^32 — b'ij 
^12 — ^Ij ’ 
^33 ~ ^34 
^13 ~ ^14 



j = 2.3,4, 
i = 3,4, 



A(4) and A(5) have the same conditions as for case 1 (this is because the cells o(l,2) and o(l,3) are 
empty). 

Note that it is not possible to estimate the ration A(6)/A(l) as this would require observations in the 
a(l,4) cell. However we can estimate the ratio A(6)/A(2) by the condition: 

^ _ 064 ~ ^64 

A2 024 — 624 



With this estimate in hand, we can obtain an estimate of the ratio Ag/Ai by multiplying with the estimate 
of A 2 /A 1 . 

With all loadings estimated we can estimate intercepts /u(l), . . . , /u(6) and control functions c(l, 0), . . . , c(4, 0) 
^ Due to the small size of our sample, we do not observe individuals in the cells i = 1 and j > 2 of matrix A. 
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c(l, 1), . . . , c(4, 1). The model implies the following restrictions; 



aij =/x(l) + A(l)c(i,0), 
bn=fJ-{l) + X{l)c{l,l), 


i = l,2,3,4, 


= /^(2) + A(2)c(_7, 0), 


i = l,2,3,4, 


hj = /^(2) + \{2)c{j, 1), 


i = l,2,3,4, 


a^j =/i(3) + A(3)c(;,0), 


i = l,2,3,4, 


= /^(3) + X{3)c{j, 1), 


i = l,2,3,4, 


a^j =/i(4) + A(4)c(;,0), 


i = 2,3,4, 


= /^(4) + X{4)c{j, 1), 


i = 2,3,4, 


05 j = /i(5) + A(5)c(;,0), 


II 


^ 5 j = /^(5) + X{5)c{j, 1), 
064 = /i(6) + A(6)c(4, 0), 
&64 = /^(6) + A(6)c(4, 1), 


II 



^^c(s,n)P(s,n) = 0. 

S=1 7^=0 

These conditions imply 34 restrictions on the 14 parameters 9 = /i(l), . . . , /i(6), c(l, 0), . . . , c(4, 1). The 
minimum distance estimator is used to impose the conditions. 




37 

40 



Appendix C Estimation Procedure for Structural Model 

Let S € { 1 , . . . , 5} denote joint choice of completed schooling and age at entry. For clarity we create a 
special notation in this appendix and let Z{s) be the set of Z variables with nonzero coefficients in the 
choice equation: 

V{s) = z{s)'y{s) +a{s)f + u{s), s = l,...,S. 

where u{s) ~ N{0, 1 ). We observe S = argmax{V'(s)}. 

_ 3 

Let G {1, . . . , St} be observed schooling at test date. Let T*{St) be the vector of latent test scores 
conditional on schooling level st, where (Sr) denotes the A:th test. For A: = 1 , ..., K, let 

S'kiSr) = X{ST)Pk{ST) +^k{ST)f + ^k{ST), St = 1,...,St- 

where Sk{ST) ~ N{0,ak{ST)‘^). We observe 

T =j ifTfc*(sT)<Cfc 

\ Cfc ifTfc*(sT)>Cfc 

if and only if = *5^ where Ck is the known maximum value of test k. 

Let 

/ 

i=l 

We set / = 3. 

Let 9 denote the vector of model parameters {a(s)}fri , {/5fc(s), Afc(s), CTfc(s)}ff We 

estimate the model parameters via Bayesian Markov Chain Monte Carlo (MCMC) methods. The goal 
is to sample from the posterior distribution of the parameters 9 and the parameters of the distribution 
of the factor /, conditional on observed outcomes, S and St, and covariates from a random sample of 
individuals indexed i = l,..,n. The posterior is only a computational device. We are doing maximum 
likelihood-based inference using MCMC as a computational tool. We impose a noninformative flat prior on 
all slope.coefficients, 7 and /5. We put proper priors on the variance parameters from the Inverse Gamma 
family of distributions and on the factor loadings from the Normal distribution family. Under standard 
regularity conditions, the priors are asymptotically irrelevant. 

The data for each individual are test scores, schooling at test date and completed schooling/entry age, 
T(St),St,S, plus covariates, A,X,Z, where A is age of test date. The likelihood contribution for one 
individual is 



p(T(^St) = t, S = s , St = •S't’I.A = ci,X = x,Z = z^ = 

j p(t\S = s, St = St, A = a, X = x, /) x 

Pr(5T = St\S = s , a = a, X = x,Z = z,f)x 

PT{S = s\f,Z = z)p{f)df. 



(C.l) 



However, this hkehhood simphfies due to the exact dependence between S, A, St described in the text. 
In particular. 



Pr(57’ = St\S = s,A = a,X = x,Z = z , f) = 



1(st = a)l(a < Sc) + 1(st = •Sc)l(o > s), (C.2) 
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where Sc denotes the schooling dim ension of 5 (recall that S includes states for both completed schooling 
and age of entry). So the likelihood contribution for an individual who has not yet completed schooling is 



p{T{St) = t,S = s,St = st\A = a,X = x,Z = z).= 




= s,ST = ST,A = a,X = X, /, Sr = a) X 

Pr(5 = s\f,Z = z)p{f)df, 



while the likelihood contribution for an individual who has completed schooling is 



piT^Sx) = t, S = s, St — 'Sr|.d = a, X = x , Z = z) = 

= s,ST = sr,A = a,X = x, /, st = Sc) x 

Fv{S = s\f,Z = z)p{f)df. 




The two likelihood contributions are functionally identical - the only difference is what value of st is 
conditioned on. 

To resolve the high dimensional integrals in these Ukelihood contributions we augment the likelihood 
with latent utilities V, determining choice of S, factors / and latent test scores T*. 



p{T*,VJ,ST = ST\A = a,X = x,Z = z) = 

p{T*\St = ST,A = a,X = xJ)p{V\f,Z = z)p{f), (C.3) 

where st is either a or completed schooling depending on the individual having completed schoohng or 
not at test date. Integration of (C.3) with respect to V,T*,f leads us back to the original likelihood. 

The (augmented) sample likelihood is defined as the product of (C.3) over all individuals. We can easily 
implement a Gibbs sampling algorithm which samples iteratively from the posterior distributions of the 
parameters and latent data conditional on the observed data. The stationary distribution of the Markov 
chain generated by this algorithm is the joint posterior distribution of the parameters. 

The MCMC algorithm is implemented as foUows. Given initial starting values for the parameters 
and V,f,T* for m = 1,2,... we can update the values of the other parameters and sample from the 
following conditional distributions (note that we implicitly are conditioning on the data as weU as aU other 
parameters). Table C-1 summarizes the specifications of the prior distributions for the estimates reported 
in the paper. 

1. The conditional posterior distribution of the latent utilities V is just the product of the individual 
conditional posterior distributions of Vi by independence. Let Si be observed final schoohng for 
individual i and let Zi{s) be aU covariates entering schooling alternative s. The elements of Vi are 
sampled from truncated normals (as in McCulloch and Rossi (1994)), 



V-fsl ~ f TN[niax«^s{Vi(^)},oo)(^i('S)7('®) 4" 

|TN(_oo,K(5i)](^(s)7('S) + Oi{s)fi, 1), if s 7^ Si. 

2. Conditional on F = {Vi(s)}j,s, the distribution of 7(5) foUows from a classical linear regression model 
with noninformative prior. 



7(s) ~ N(7(s),n(s)), s = l,...,5. 



O 
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where 



= {Z{syZ{s))-^Z{sy{V{s)-a{s)f) 
n{s) = {z{syz{s))-\ 

where V(s) = (Vi(s), . . . ,K(s)) and Z(s)' = (Zi(s), . . . , Z„(s)). 

3. Assuming a normal prior the conditional distribution of a(5) is: 

a(5) ~ N(a(5),fi(5)), 5 = 1,. ..,5— 1. 

where 

iw = fj; (/'(VW - Z(s)7(s)) + 

a{S) is set to zero for identification. S imil arly, coefficients for covariates common across alternatives 
are set to zero for 7(5). We set = 0 and ipl= 1. 

4. For each test equation, k = 1,. .. ,K, at each schooling level, s = 1, .., St, we estimate the coefficients 
on the control variables as follows: 



/3,{s) ~ N {{X{syX{s))-^X{sy{T;:{s) - Xk{s)f),ak{s)\x{syx{s))-^) , 



where only those individuals who have completed schoohng level s at the test date are included. 

5. The factor loadings in the test equations are sampled as 

Afc(s) ~ N ^Afc(s), Qfc(s) j , k = l,...,K] s = 



where 



Afc(s) 



Ms? ^Ms)J 

f f'f 1 

Vct'=(s)2 ^ 



using only the individuals who have schooling level s at the test date and using a normal prior 
N(/ifc(s),^ik(s))- We use /^^(s) = 0 and V'fc(s) = 1- 

6. Ass uming an Inverse Gamma prior IG{as, bg) and letting n(s) be the nmnber of individuals in school- 
ing group s at the test date, we have: 



where efc(s) = T^{s) — X{s)/3f.{s) — Afc(s)/. We set = 2 and = 1. 
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7. The factors / and the parameters of the factor distribution are sampled as follows. Let € {1, . . . , 
denote the mixture component from which fi is sampled. Note that Qi is unobserved. Conditional 
on gi the conditional distribution of fi is easily found to be 

/i~N(7,,f,) 



where 



fi = Ti 



Ef=i a{s){Vi{s) - Zi{s)^{s))+ 

Ylk=l(^k{STi)/(Jk{STi)^){T*f. - Xi{STi^k{sTi)) + {l/criJPg. 

-1 



\s=l fc=l / 



where Sti is individual i’s schooling at test date. 

Conditional on / the mixture parameters are sampled by the usual trick of first updating the gi 
indicators and then sampling the mixture parameters conditional on the gi ’s, cf. Robert and CaseUa 
(1999). We impose the restriction — 0 using the method in Richardson et.al. (2001). 



8. The test scores for individuals who hit the ceihng on a test are sampled from truncated normals, i.e., 



Tfk ~ TN (c^^oo) {XfsTiY ^ki^Ti) -b Afc(STi)/i,CTfc(sTi)^). 
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Table A-1 

Sample Means By Final Attainment Status 
NLSY, White Males 



Variables 


Overall 




By Final Years of Education 


- 


HS Dropout 


HS Graduate 


Some College 


College Graduate 


No. Observations ^ 


2066 


330 


722 


395 


619 


Urban Dummy 


0.7517 


0.7394 


0.7105 


0.7443 


0.8110 




(0.4321) 


(0.4396) 


(0.4538) 


(0.4368) 


(0.3918) 


From Broken Home 


0.1999 


0.4000 


0.1814 


0.1924 


0.1196 




(0.4000) 


(0.4906) 


(0.3856) 


(0.3947) 


(0.3247) 


Number of Siblings 


2.9942 


3.6900 


3.0997 


3.0152 


2.4862 




(1.9805) 


(2.3302) 


(1.8728) 


(2.1147) 


(1.6559) 


Southern Dummy 


0.2493 


0.3818 


0.2119 


0.2506 


0.2213 




(0.4327) 


(0.4866) 


(0.4089) 


(0.4339) 


(0.4155) 


Mother's Education 


12.1343 


10.5019 


11.5817 


12.2393 


13.4105 


(N=1884) 


(2.3326) 


(2.3080) 


(1.9751) 


(1.9270) 


(2.2662) 


Father's Education 


12.4202 


10.1341 


11.4407 


12.6711 


14.4179 


(N=1942) 


(3.3135) 


(3.0890) 


(2.7352) 


(2.9093) 


(3.1170) 


Family Income (^ousands) 


22.3244 


14.7412 


20.7361 


22.5783 


28.1310 


(N=1695) 


(14.3756) 


(9.7335) 


(11.6325) 


(13.2189) 


(17.5746) 


Born in First Quarter 


0.2464 , 


0.2364 


0.2493 


0.2658 


0.2359 




(0.4310) 


(0.4255) 


(0.4329) 


(0.4423) 


(0.4249) 


Born in Second Quarter 


0.2483 


0.2606 


0.2396 


0.2329 


0.2617 




(0.4321) 


(0.4396) 


(0.4271) 


(0.4232) 


(0.4399) 


Born in Third Quarter 


0.2672 


0.2727 


0.2659 


0.2709 


0.2633 




(0.4426) 


(0.4460) 


(0.4421) 


(0.4450) 


(0.4408) 


Behind Peers 


0.3204 


0.5455 


0.3033 


0.3038 


0.2310 




(0.4668) 


(0.4987) 


(0.4600) 


(0.4605) 


(0.4218) 


Born in 1957 


0.1026 


0.0970 


0.0983 


0.1266 


0.0953 




(0.3035) 


(0.2964) 


(0.2980) 


(0.3329) 


(0.2939) 


Born in 1958 


0.0978 


0.0606 


0.0886 


0.1139 


0.1179 




(0.2971) 


(0.2390) 


(0.2844) 


(0.3181) 


(0.3228) 


Born in 1959 


0.1094 


0.1000 


0.1260 


0.1038 


0.0985 




(0.3122) 


(0.3005) 


(0.3321) 


(0.3054) 


(0.2983) 


Born in 1960 


0.1336 


0.1394 


0.1482 


0.1114 


0.1276 




(0.3403) 


(0.3469) 


(0.3555) 


(0.3150) 


(0.3339) 


Born in 1961 


0.1317 


0.1272 


0.1343 


0.1392 


0.1260 




(0.3382) 


(0.3338) 


(0.3413) 


(0.3466) 


(0.3321) 


Born in 1962 


0.1641 


0.1667 


0.1634 


0.1671 


0.1616 




(0.3704) 


(0.3732) 


(0.3700) 


(0.3735) 


(0.3683) 


Born in 1963 


0.1389 


0.1636 


0.1316 


0.1215 


0.1454 




(0.3459) 


(0.3705) 


(0.3383) 


(0.3271) 


(0.3528) 


Local Dropout Wage 


6.5651 


6.5853 


6.5993 


6.5913 


6.4976 




(1.2256) 


(1.3347) 


(1.2512) 


(1.2165) 


(1.1377) 


Local Dropout 


0.0697 


0.0684 


0.0718 


0.0710 


0.0672 


Unemployment Rate 


(0.0231) 


(0.0237) 


(0.0231) 


(0.0240) 


(0.0219) 


Local HS Graduate Wage 


7.5509 


7.5600 


' 7.5186 


7.5742 


’ 7.5689 


(1.4599) 


(1.5218) 


(1.3438) 


(1.4226) 


(1.5780) 


Local. HS Unempl. Rate 


0.0573 


0.0552 


0.0609 


0.0588 


0.0531 


(0.0254) 


(0.0266) 


(0.0254) 


(0.0264) 


(0.0235) 


Local Wage for Some 


7.6666 


7.6679 


7.6692 


7.7600 


7.6033 


College 


(1.4020) 


(1.4245) 


(1.4148) 


(1.4042) 


(1.3733) 


Local Unempl. Rate for 


0.0371 


0.0355 


0.0395 


0.0378 


0.0347 


Some College 


(0.0156) 


(0.0160) 


(0.0155) 


(0.0162) 


(0.0148) 


4-Year College Tuition 


19.8694 


18.0718 


21.3384 


19.5033 


19.3481 


(tens) 


(7.8463) 


(7.2133) 


(7.7863) 


(8.3439) 


(7.6350) 
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Table A-1, Continued 
Sample Means By Final Attainment Status 
NLSY, White Males 



Variables 


Overall 


HS Dropout 


By Final Years of Education 
HS Graduate Some College 


College Graduate 


Distance to 4-Year College 


8.1149 


10.4564 


8.3852 


9.2578 


5.8219 




(16.4639) 


(19.9424) 


(15.9885) 


(16.8611) 


(14.3320) 


Word Knowledge 


26.8930 


20.1000 


25.1911 


28.6886 


31.3538 




(7.03266) 


(8.1892) 


(6.5495) 


(5.0356) 


(3.6514) 


Paragraph Comprehension 


10.9719 


7.7848 


10.1801 


11.8152 


13.0565 




(3.3182) 


(3.5822) 


(3.2502) 


(2.5227) 


(1.6166) 


Arithmetic Reasoning 


19.7333 


13.0242 


17.6191 


20.9747 


24.9838 




(7.2253) 


(5.8957) 


(6.5290) 


(5.9915) 


(5.0460) 


Math Knowledge 


14.6438 


8.3272 


11.9834 


15.5873 


20.5121 




(6.5385) 


(4.0018) 


(5.0292) 


(5.5759) 


(4.5122) 


Overall AFQT 


72.2420 


49.2364 


64.9737 


77.0658 


89.9063 




(21.6023) 


(18.7146) 


(18.1901) 


(16.0706) 


(12.2652) 
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Table A-2 

Sample Means by Education at Test Date 
NLSY, White Males 



By Years of Education at Test Date Quly-October 1980) 

College 



Variables 


Overall 


ti 

V 


10 


11 


HS Graduate 


Some College 


Graduate 


No. Observations 


2066 


205 


322 


343 


747 


376 


73 


Urban Dummy 


0.7517 


0.7171 


0.7360 


0.7405 


0.7349 


0.8138 


0.8219 




(0.4321)' 


(0.4515) 


(0.4415) 


(0.4390) 


(0.4417) 


(0.3898) 


(0.3852) 


From Broken Home 


0.1999 


0.3805 


0.2671 


0.2187 


0.1620 


0.1197 


0.1096 




(0.4000) 


(0.4867) 


(0.4431) 


(0.4139) 


(0.3687) 


(0.3250) 


(0.3145) 


Number of Siblings 


2.9942 


3.7366 


2.9534 


2.9767 


2.9411 


2.8830 


2.2877 




(1.9805) 


(2.4968) 


(1.8897) 


(1.9330) 


(1.8523) 


(2.0375) 


(1.3487) 


Southern Dummy 


0.2493 


0.4341 


0.2484 


0.2449 


0.2129 


0.2340 


0.2055 


(0.4327) 


(0.4969) 


(0.4328) 


(.4307) 


(0.4096) 


(0.4240) 


(0.4068) 


Mother’s Education 


12.1343 


10.4892 


11.7412 


12.0709 


11.9636 


13.0822 


13.7455 


(N=1884) 


(2.3326) 


(2.4653) 


(2.0647) 


(2.3630) 


(2.1499) 


(2.1671) 


(2.0925) 


Father's Education 


12.4202 


9.9712 


12.057 


12.0709 


12.0971 


13.8783 


14.2364 


(N=1942) 


(3.3135) 


(3.3274) 


(3.2026) 


(3.1889) 


(3.0071) 


(3.1258) 


(3.3331) 


Family Income ^Thous.) 


22.3244 


15.0775 


20.1549 


22.7529 


22.1551' 


27.9788 


25.9860 


(N=1695) 


(14.3756) 


(9.0086) 


(12.0373) 


(13.1427) 


(13.0989) 


(18.1897) 


(17.8028) 


In School at Test Date 


0.5034 


0.3902 


0.7702 


0.7055 


0.2249 


0.7261 


0.3973 




(.5001) 


(0.4890) 


(0.4214) 


(0.4565) 


(0.4178) 


(0.4466) 


(0.4927) 


Born in 1957 


0.1026 


0.0634 


0.0248 


0.0321 


0.1098 


0.1622 


0.5068 




(0.3035) 


(0.2443) 


(0.1559) 


(0.1764) 


(0.3128) 


(0.3692) 


(0.5034) 


Born in 1958 


0.0978 


0.0293 


0.0155 


0.0350 


0.1017 


0.1835 


0.4658 




(0.2971) 


(0.1690) 


(0.1238) 


(0.1840) 


(0.3025) 


(0.3876) 


(0.5023) 


Born in 1959 


0.1094 


0.0634 


0.0248 


0.0437 


0.1446 


0.2128 


0.0274 




(0.3122) 


(0.2443) 


(0.1559) 


(0.2048) 


(0.3519) 


(0.4098) 


(0.1644) 


Born in 1960 


0.1336 


0.0585 


0.0404 


0.0816 


0.1754 


0.2447 


0.0000 




(0.3403) 


(0.2353) 


(0.1971) 


(0.2742) 


(0.3805) 


(0.4305) 


(0.0000) 


Born in 1961 


0.1317 


0.0780 


0.0373 


0.0816 


0.1954 


0.1862 


0.0000 




(0.3382) 


(0.2689) 


(0.1897) 


(0.2742) 


(0.3968) 


(0.3898) 


(0.0000) 


Born in 1 962 


0.1641 


0.1561 


0.0776 


0.2391 


0.2637 


0.0080 


0.0000 




(0.3704) 


(0.3638) 


(0.2680) 


(0.4271) 


(0.4409) 


(0.0891) 


(0.0000) 


Born in 1 963 


0.1389 


0.1415 


0.2609 


0.4840 


0.0094 


0.0027 


0.0000 




(0.3459) 


(0.3494) 


(0.4398) 


(0.5005) 


(0.0964) 


(0.0516) 


(0.0000) 


Word Knowledge 


26.8930 


18.4488 


24.2888 


26.2828 


27.5636 


31.7766 


32.9452 




(7.03266) 


(7.2581) 


(7.3532) 


(6.3737) 


(5.9219) 


(3.5437) 


(2.2292) 


Paragraph 


10.9719 


7.0878 


9.8758 


10.9679 


11.2503 


13.0000 


13.4384 


Comprehension 


(3.3182) 


(3.3302) 


(3.6382) 


(3.1589) 


(2.8090) 


(1.8504) 


(1.2582) 


Arithmetic Reasoning 


19:7333 


12.0683 


17.0870 


18.9650 


19.9545 


25.0532 


26.8767 




(7.2253) 


(5.6755) 


(6.9550) 


(6.7120) 


(6.4590) 


(4.9727) 


(3.7228) 


Math Knowledge 


14.6438 


0.3805 


12.7702 


13.7843 


14.1245 


20.1197 


21.8493 




(6.5385) 


(0.4867) 


(6.3278) 


(6.1909) 


(5.7016) 


(4.7017) 


(3.7256) 


Overall AFQT 


72.2420 


3.7366 


64.0217 


70.0000 


72.8929 


89.9495 


95.1096 




(21.6023) 


(2.4968) 


(21.5587) 


(19.3215) 


(17.8026) 


(12.4515) 


(8.9047) 
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Table A-3 



ASVAB Test Information: AFQT Subcomponents 




Word 


Paragraph 


Arithmetic 


Math 






Knowledge 


Comprehension 


Reasoning 


Knowledge 


OveraU AFQT 


Number of Questions 


35 


15 


11 


25 


86 


Time (in minutes) 


11 


13 


36 


24 


84 


Max. Possible Raw Score 


35 


15 


30 


25 


.105 






Fraction of Observations Censored in Each Subsystem 














At Least 1 




Word 


Paragraph 


Arithmetic 


Math 


AFQT 


Schooling at Test Date 


Knowledge 


Comprehension 


Reasoning 


Knowledge 


Component 


9 Years or Less 


0.0000 


0.0098 


0.0049 


0.0000 


0.0146 


10 Years 


0.0155 


0-0311 


0.0248 


0.0217 


0.0745 


11 Years 


0.0496 


0.0787 


0.0437 


0.0408 


0.1370 


12 Years/ HS Graduate 


0.0535 


0.0656 


0.0348 


0.0388 


0.1392 


Some College 


0.2128 


0.1649 


0.1622 


0.1303 


0.4521 


College Graduate 


0.3151 


0.1781 


0.2740 


0.2192 


0.5890 


Overall 


0.0799 


0.0789 


0.0634 


0.0557 


0.1893 



Table C-1 

Specification of Priors for Reported Structural Model Estimates 



Parameter Prior Distribution 





Noninformative flat prior on nonzero coefficients. 
Degenerate prior with point mass at zero for 
restricted coefficients. (See Table 6.) 




Nfc.V’f) 




Noninformative flat prior. 




N(/ifc(5),'0fc(s)) 




I G ^Q>s j bg^ 



Prior Specification 



/^i = 0, lA? = 1 



/ifc(s) =O,'0fc(5)= 1 

dg — 1 





Table 1 



Non-parametric Estimates of Factor Loadings 



Comparison Groups (s, sf) 




(1,2) 


(1,3) 


(1,4) 


(2,3) 


(2,4) 


(3,4) 


MD* 


? 


A(2) 


2.26 


1.38 


1.00 


1.03 


0.81 


0.68 


0.83 


6.31 




(1.52) 


(0.45) 


(0.15) 


(0.54) 


(0.16) 


(0.27) 


(0.13) 


(p = 0.28) 


A(3) 


0.71 


0.73 


0.87 


0.73 


0.89 


0.98 


0.87 


0.83 




(0.72) 


(0.27) 


(0.13) 


(0.40) 


(0.16) 


(0.34) 


(0.12) 


(p = 0.98) 


A(4) 








0.89 


0.66 


0.52 


0.61 


3.01 










(0.41) 


(0.12) 


(0.19) 


(0.12) 


(p = 0.22) 


A(5) 












0.56 


0.56 


(na) 














(0.20) 


(0.20) 


(na) 



*MD = minimum distance 
We normalize A(l) = 1. 



Table 2 

Non-parametric Estimates of Intercepts and Control Functions 



Intercepts 



A(i) 


A(2) 


m 


m 


A(5) 


56.91 


65.93 


71.63 


75.24 . 


82.58 


(1.27) 


(0.95) 


(0.78) 


(0.61) 


(0.63) 



Control Functions 



E[f\S =Dropout] -15.89 

(1.15) 

E[/|5 =High School] -10.70 

(0.74) 

E[f\S =Some College] 1.95 

(0.99) 

E[f\S =College] 19.70 

(0.76) 

E[f\ST =9th Grade or Less] -10.99 

(0.95) 

E[f\ST =10th Grade] -2.23 

(0.87) 

E[f\ST =llth Grade] -2.23 

(0.75) 

E[f\ST =High School] -3.63 

(0.56) 

E[f\ST =Some College] 13.33 

(0.71 ) 

E[f\ST =College] 19.70 

(7.33) 

= 10.61 {p = 0.30) 



Table 3 



Non-parametric Estimates of Factor Loadings 
Controlling for Endogenous Start Date 



A(2) 


1.23 


1.13 


A(3) 


(0.22) 


{p = 0.98) 


0.96 


1.45 


A(4) 


(0.18) 


{p = 0.96) 


0.94 


0.01 


A(5) 


(0.23) 


{p = 0.92) 


0.41 


N/A 


A{6) 


(0.25) 


. N/A 


0.00 


N/A 




(0.90) 


N/A 



Table 4 

Non-parametric Estimates of Intercepts and Control Functions 
Controlling for Endogenous Start Date 



Intercepts 



A(i) A(2) 

57.13 66.06 

(1.35) (0.96) 


A(3) 

72.25 

(0.77) 


A(4) A(5) 

74.62 87.10 

(0.65) (0.54) 


A(6) 

95.10 

(1.04) 


Control Functions 


E[f\S =Dropout, N =Normal] 


-11.14 


E[f\S =Dropout, A = Behind] 


-17.03 




(1.34) 




(1.38) 


E[f\S =High School, N =Normal] 


-5.24 


E[f\S =High School, A = Behind] 


-12.02 


i 


(0.74) 




(1.15) 


E[f\S =Some College, N =Normal] 


2.26 


E[f\S =Some College, A = Behind] 


-1.26 




(0.95) 




(1.73) 


E[f\S =College, N =Normal] 


15.80 


E[f\S =College, A = Behind] 


14.08 




(0.71) 




(1.38) 


E[f\ST =9th Grade or Less, =Normal] -6.73 


E[f\ST =9th Grade or Less, A =Behind] 


-12.45 




(2.26) 




(1.14) 


=10th Grade, A” =Normal] 


0.84 


E[f\ST =10th Grade, A =Behind] 


-7.52 




(0.71) 




(1.33) 


E[f\ST =llth Grade , N =Normal] 


1.21 


E[/|5t =llth Grade , A =Behind] 


-8.23 




(0.86) 




(1.22) 


E[f\ST =High School , N =Normal] 


-0.52 


E[f\ST =High School, A =Behind] 


-6.30 




(0.56), 




(1.07) 


E[f\ST =Some College, A =Normal] 


,11.19 


E[f\ST =Some College , A =Behind] 


7.85 




(0.98) 




(1.22) 


E[f\ST =College, A =Normal] 


N/A 


E[f\ST =College, A =Behind] 


N/A 




N/A 




N/A 



= 37.40 {p = 0.01) 
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Table 5 

Non-parametric Estimates of Intercepts and Control Functions 
Allowing for A-Effects in Intercepts and Controlling for Endogenous Start Date 
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Table 7 

X^-Statistics for Choice Model 
Average Choice Probabilities in Selected Groups 





Dropout 


HS Grad. 


Some Coll. 


Coll. Grad. 


X^-Statistic 


P-value 


Overall (N=2066) 














Normal/ Ahead Actual 


0.0726 


0.2435 


0.1331 


0.2304 






Predicted 


0.0716 


0.2452 


0.1359 


0.2273 






Behind Actual 


0.0871 


0.1060 


0.0581 


0.0692 






Predicted 


0.0843 


0.1052 


0.0605 


. 0.0697 


0.6775 


0.9985 


Individuals from urban area 


(N=1553) 












Normal/ Ahead Actual 


0.0734 


0.2292 


0.1301 


0.2473 






Predicted 


0.0722 


0.2330 


0.1335 


0.2420 






Behind Actual 


0.0837 


0.1011 


0.0592 . 


0.0760 






Predicted 


0.0802 


0.1007 


0.0622 


0.0760 


0.9032 


0.9962 


Individuals from rural area i 


(N=513) 












Normal/ Ahead Actual 


0.0702 


0.2865 


0.1423 


0.1793 






Predicted 


0.0698 


0.2822 


0.1433 


0.1830 






Behind Actual 


0.0975 


0.1209 


0.0546 


0.0487 






Predicted 


0.0968 


0.1188 


0.0555 


0.0504 


0.1345 


1.0000 


Individuals with less than 3 siblings (N= 


=968) 










Normal/ Ahead Actual 


0.0610 . 


0.2231 


0.1312 


0.3171 






Predicted 


0.0562 


0.2383 


0.1405 


0.2926 






Behind Actual 


0.0589 


0.0857 


0.0496 


0.0733 






Predicted 


0.0607 


0.0886 


0.0501 


0.0721 


4.0925 


0.7691 


Individuals with 3 or more siblings (N= 


1098) 










Normal/ Ahead Actual 


0.0829 


0.2614 


0.1348 


0.1539 






Predicted 


0.0851 


0.2513 


0.1319 


0.1698 






Behind Actual 


0.1120 


0.1239 


0.0656 


0.0656 






Predicted 


0.1052 


0.1198 


0.0697 


0.0675 


3.1838 


0.8675 


Avg. parents’ education <12 years (N= 


=803) 










Normal/Ahead Actual 


0.1270 


0.2827 


0.1009 


0.0934 






Predicted 


0.1234 


0.2730 


0.1173 


0.1204 






Behind Actual 


0.1694 


0.1469 


0.0523 


0.0274 






Predicted 


0.1407 


0.1365 


0.0556 


0.0331 


13.3411 


0.0642 


Avg. parents’ education > 12 years (N= 


=1263) 










Normal/ Ahead Actual 


0.0380 


0.2185 


0.1536 


0.3175 






Predicted 


0.0386 


0.2276 


0.1478 


0.2953 






Behind Actual 


0.0348 


0.0800 


0.0618 


0.0958 






Predicted 


0.0485 


0.0852 


0.0636 


0.0929 


8.3124 


0.3059 


Four-year College Tuition < $2,000 (N= 


=1008) 










N ormal / Ahead Actual 


0.0863 


0.2004 


0.1290 


0.2500 






Predicted 


0.0791 


0.2214 


0.1364 


0.2370 






Behind Actual 


0.1032 


0.0982 


0.0605 


0.0724 






Predicted 


0.0969 


0.0995 


0.0568 


0.0722 


4.4722 


0.7241 


Four-year College Tuition > $2,000 (N= 


=1058) 










Normal/ Ahead Actual 


0;0595 


0.2845 


0.1371 


0.2117 






Predicted 


0.0644 


0.2679 


0.1354 


0.2181 






Behind Actual 


0.0718 


0.1134 


0.0558 


0.0662 






Predicted 


0.0723 


0.1106 


0.0641 


0.0672 


2.9245 


0.8919 


Zero Distance to Four- Year College (N= 


=1552) 










Normal/ Ahead Actual 


0.0689 


0.2397 


0.1308 


0.2577 






Predicted 


0.0686 


0.2398 


0.1357 


0.2493 






Behind Actual 


0.0754 


0.1018 


0.0541 


0^0715 






Predicted 


0.0772 


0.0983 


0.0577 


0.0730 


1.3616 


0.9867 


Nonzero Distance to Four- Year College (N=514) 


Normal/Ahead Actual 


0.0837 


0.2549 


0.1401 


0.1479 






Predicted 


' 0.0805 


0.2617 


0.1365 


0.1608 






Behind Actual 


0.1226 


0.1187 


0.0700 


0.0623 






Predicted 


0.1057 


0.1259 


0.0692 


0.0597 


2.4023 


0.9343 
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Table 8 



X^-Statistics for Predicted AFQT Distributions 
Conditional on Schooling Level at Test Date 



Schooling Level 


N 


No. bins 


X^-Statistic 


P-value 


Ninth Grade or Less 


205 


8 


16.8349 


0.0185 


Tenth Grade 


322 


12 


7.2653 


0.7772 


Eleventh Grade 


343 


13 


12.3787 


0.4158 


High School Graduate 


747 


26 


30.5079 


0.2058 


Some Gollege 


376 


13 


35.3758 


0.0004 


Gollege Graduate 


73 


5 


10.3990 


0.0342 


Overall 


2066 


77 


112.7616 


0.0040 



Note: Bins were chosen to include approx, equal numbers of observations 
in each ceil. No. bins was chosen to average roughly 25-30 people per bin, 
except last group due to small size. 



Table 9 



Estimates from OLS Regression of Log Wage in 1998 on Years of Schooling and Residualized AFQT 



Variable 


GoelRcient Std. Error 


t 


p>i(i 


95 % Gonfidence Interval 


Years of Schooling in 1998 


0.1022 0.0101 


10.08 


0.000 


0.0823 


0.1221 


Experience 


-0.5657 0.0457 


-1.24 


0.216 


-0.1463 


0.0332 


Experience^ 


0.0028 0.0013 


2.23 


0.026 


0.0003 


0.0053 


OLS-Residualized AFQT 


0.0988 0.0221 


4.48 


0.000 


0.0555 


0.1421 


Gonstant 


1.4812 0.4749 


3.12 


0.002 


0.5493 


2.4132 


Source 


SS 


Degrees 


of Freedom 




MS 


Model 


62.4505 




4 




15.6126 


Residual 


249.9216 




977 




0.2558 




Number of Obs. = 982 










F(4, 977) = 


61.03 










Proh. > F = 


0.0000 










P? = 0.1999 










Adjusted FI? = 


= 0.1966 










Root MSE = 


0.5058 









Note: Regressions estimated on observations with nonmissing wages 
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Table 10 



Estimates from OLS Regression of Log Wage in 1998 
on Years of Schooling and Schooling Corrected AbiUty Measure 



Variable 


Coefficient 


Std. Error 


t 


P>|4| 


95 % Confidence Interval 


Years of Schooling in 1998 


0.1176 


0.0094 


12.46 


0.000 


0.0991 


0.1362 


Experience 


-0.0589 


0.0461 


-1.28 


0.202 


-0.1494 


0.0316 


Experience^ 


0.0029 


0.0013 


2.29 


0.022 


0.0004 


0.0054 


/ 


0.1483 


0.0734 


2.02 


0.044 


0.0042 


0.2923 


Constant 


1.2708 


0.4758 


2.67 


0.008 


0.3371 


2.2045 


Source 


SS 


Degrees of Freedom 


MS 


Model 


58.3832 




4 




14.5958 


Residual 


253.9890 




977 




0.2600 




Number of Obs. = 982 










F(4, 977) = 


56.14 












Prob. > F = 


0.0000 












B? = 0.1869 












Adjusted P? - 


= 0.1836 












Root MSE = 


0.5099 









Note: Regressions estimated on observations with nonmissing wages 



Table 11 



Estimates from OLS Regression of AFQT Score on Years of Schooling at Test Date 



Variable 


Coefficient 


Std. Error 


t 


P> t\ 


95 % Confidence Interval 


Years of Schoohng 


5.5800 


0.2867 


19.4600 


0.0000 


5.0177 


6.1423 


Urban Status 


0.2610 


0.8569 


0.3000 


0.7610 


-1.4195 


1.9415 


Broken Home 


0.9417 


0.9729 


0.9700 


0.3330 


-0.9662 


2.8496 


Number of Siblings 


-0.5773 


0.1898 


-3.0400. 


0.0020 


-0.9496 


-0.2051 


Southern 


-2.6561 


0.8535 


-3.1100 


0.0020 


-4.3298 


-0.9823 


Mother’s Education 


1.9069 


1.3426 


1.4200 


0.1560 


-0.7261 


4.5400 


Father’s Education 


8.0419 


1.2635 


6.3600 


0.0000 


5.5640 


10.5197 


Family Income 


0.1250 


0.0299 


4.1900 


0.0000 


0.0665 


0.1836 


Age 


0.1239 


0.2562 


0.4800 


0.6290 


-0.3785 


0.6264 


In School 


8.6112 


0.9174 


9.3900 


0.0000 


6.8122 


10.4103 


Constant 


-11.9674 


4.0363 


-2.9600 


0.0030 


-19.8830 


-4.0517 


Source 




SS 


Degrees 


of Freedom 




MS 


Model 

Residual 


407339.225 

556311.768 


10 

2055 




40733.923 

270.711 


Number of Obs. = 2066 
F(10, 2055) = 150.47 



Proh. > F = 0.0000 
= 0.4227 

Adjusted = 0.4199 

Root MSE = 16.453 

Note: Instruments for years of schooling: quarter of birth dummies, lurban status, broken home, number of siblings, 
^ southern residence, mother’s education, father’s education, family income and cohort dummies. 
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Table 12 



Estimates from Instrumental Variables Regression of AFQT Score on Years of Schooling at Test Date 



Variable 


Coefficient 


Std. Error 


t 


p>i(i 


95 % Confidence Interval 


Years of Schooling 


4.5164 


0.9203 


4.9100 


0.0000 


2.7117 


6.3212 


Urban Status 


0.2174 


0.8605 


0.2500 


0.8010 


-1.4702 


1.9050 


Broken Home 


0.8077 


0.9823 


0.8200 


0.4110 


-1.1187 


2.7341 


Number of Siblings 


-0.6708 


0.2054 


-3.2700 


0.0010 


-1.0735 


-0.2680 


Southern 


-2.8466 


0.8705 


-3.2700 


0.0010 


-4.5538 


-1.1393 


Mother’s Education 


2.2952 


1.3844 


1.6600 


0.0970 


-0.4197 


5.0102 


Father’s Education 


8.5196 


1.3271 


6.4200 


0.0000 


5.9169 


11.1223 


Family Income 


0.1347 


0.0310 


4.3400 


0.0000 


0.0739 


0.1955 


Age 


0.7564 


0.5799 


1.3000 


0.1920 


-0.3809 


1.8936 


In School 


9.6624 


1.2624 


7.6500 


0.0000 


7.1868 


12.1381 


Constant 


-13.1093 


4.1571 


-3.1500 


0.0020 


-21.2619 


-4.9567 


Source 


SS 




Degrees of Freedom 




MS 


Model 


403614.616 




10 




40361.462 


Residual 


560036.378 




2055 




272.524 






Number of Obs. 


= 2066 










F (10, 2055) = 


114.26 










Proh. > F = 0.0000 







B? = 0.4188 



Adjusted = 0.4160 

Root MSE = 16.508 

Note: Instruments for years of schooling: quarter of birth dummies, urban status, broken home, number of siblings, 
southern residence, mother’s education, father’s education, family income and cohort dummies 
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Figure 1: Actual (Diamond) vs. Predicted (Circle) AFQT Cummulative Distribution Functions Conditional on Schooling at Test Date 
(a) Ninth Grade or less (b) Tenth Grade 
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Figure 2: Estimated Factor (Solid) vs. Residualized AFQT (Dashed) Distributions 
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Figure 3: Factor Densities Conditional on Age at Entry by Final Schooling Group 
(a) High School Dropouts (b) High School Graduates 
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Figure 4: Estimated Effect of One Additional Year of Age on AFQT Score (With 95% Confidence Bands) 
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Years of Schooling 



Probability of Choice Probability of Choice 



Figure 5: Schooling Choice Probabilities as a Function of Residualized AFQT, No Factor 

(a) Normal/Ahead of Peers 




(b) Behind Peers 
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Figure 6: Schooling Choice Probabilities as a Function of Factor 
(a) Normal/Ahead of Peers 
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(b) Behind Peers 
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Figure 7: Kernel Estimated Densities of Standardized Residualized AFQT Conditional on Schooling 

(a) Normal/Ahead of Peers 




(b) Behind Peers 






Figure 8: Standardized Factor Densities Conditional on Schooling 
(a) Normal/Ahead of Peers 




(b) Behind Peers 




O 

ERIC 



65 



Figure 9: Marginal Effect of Standard Deviation Increase in Factor on AFQT Components Conditional on Schooling (With 95% Confidence B 
(a) Word Knowledge ‘ (b) Paragraph Comprehension 
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Figure 10: Marginal Effect of a Standard Deviation Increase in Latent Ability Factor on Overall AFQT Score Conditional on Schooling 

(with 95% Confidence Bands) 
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Years of Schooling 





Years of Schooling 



Figure 12: Estimated Annualized School Effects for Person with Average Ability 
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Figure 13: Comparison of Control Function and Structural Estimates of Ratios of Factor Loadings 
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Figure 14: Comparison of Control Function vs. Structural Estimates of Expected Test Score (f=0) 
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